Where are CIOs and chief data officers (CDOs) in their journeys to build out their data chops? How advanced is their data defense? Here's what participants in a recent #CIOChat had to say about data readiness.
Several years ago, Tom Davenport defined the notions of data defense and data offense. The following chart summarizes where data activities fit into data defense and data offense.
Building a Data Defense Strategy
“IT should be the stewards of data veracity,” said former CIO Mike Kail, who kicked off the discussion. Analyst Dion Hinchcliffe supported this idea, suggesting that IT take a leadership role in every data defense with the possible exception of source tracking, which he believes should be shared with anyone collecting data. "Cleanliness is a shared task," Hinchcliffe said. "But in my opinion, the rest should be job number one for IT.”
Hinchcliffe stated that IT's top data struggles are due to: 1) application silos; 2) poor integration; 3) lack of master data; 4) poor quality sources/ingestion processes; 5) limited control over underlying databases; and 6) cut and paste processes. Given these struggles, he suggested that "data defense should be about a lot more than just the integrity of data."
Hinchcliffe’s list includes the following data topics:
- Data source tracking + verification
- Data cleanliness
- Data integrity
- Data protection
- Data security
- Data trust
- Master data reconciliation/deduping
Former CIO Tim McBreen agreed with Hinchcliffe, saying he still sees a lot of issues occurring with master data reconciliation. "IT organizations tend to fix things too far down the data path in analytics vs. transactional or operational systems,” he explained. Kail shared the same concern, suggesting that “having secure, automated data ingest pipelines that allow pollutants turns data streams and lakes into murky swamps.”
Given this, it is unsurprising that former CIO Theresa Rowe thinks patient intake in emergency rooms presents a role model for data intake. "Existing health records must be accurately matched. Data updates must be fast and correct. Only when this the case, is data ready for population and individual analysis and research.”
In terms of the responsibilities for data management, CIO Stephen diFilipo argued that “data defense is the responsibility of the entire organization. If you touch data, then there is an obligation to ensure data integrity. It all starts and ends with well structured, enterprise level data governance.” However, former CIO Peter Weis claimed, “The hard reality is that even though data needs to be a participation sport across the enterprise, the CIO above everyone else is accountable. And where the CIO attempts to defer or deflect responsibility, it will come across as small and defensive.”
Related Article: The Role of the CIO in Driving Analytics
The Data Struggle Is Real
Usage is the biggest data issue for CIO Dennis Klemenz. "Exporting to manipulate data is critical for ad hoc analysis but integrity can become an issue as more analyze data. That's why data lineage is so important. Lineage should answer where the data came from. And keeping the lineage with data tied to trusted sources helps.”
Klemenz isn't alone in the struggle. CIO David Seidl said, “I see the stumbling blocks around data definitions. Divisions need data that fits their needs. Organizations need data defined. When two sources don't align and aren't called out, you end up with data integrity issues. And when you can't trust your data, you shouldn't use it.” Meanwhile, diFilipo finds visualization and reporting to be a constant struggle regarding data usefulness at his organization. "Those without process thinking tend to not understand the nuances and complexities of data structures and relationships to properly surfacing data as information,” he said.
He added that data definitions are particularly challenging in higher education. "The Integrated Postsecondary Education Data System (IPEDS), State, Federal and universities all have differing definitions and values for what appear to be the same data points,” he explained. As an example, diFilipo used defining the field for 'student.' "I start by determining which is the system of record and where in the student journey does each system function within that system of record.”
Seidl agreed. “Defining 'student' helps people at our organization understand what they thought was settled fact, [which] wasn't the reality others were living in," he said. "With this, we can talk systems of record, having recognized there's a gap and a need to solve it.”
According to Hinchcliffe, “Lines of business are pretty good at being the owner of their data." But, he continued, "They are unaware of or often unwilling to be data steward beyond their own function." For this reason, he said, the line of business doesn't appreciate the highly strategic nature of digital data. "As a former enterprise architect and integration lead, I've seen horrors that come from delivering data integrity, even within key systems. This includes the dozen customer databases, unstructured data crammed into structured fields, etc. Approximately 60% of the work on AI projects remains focused upon data wrangling."
Related Article: Data Ingestion Best Practices
The State of Data Governance Management
McBreen personally has had good luck in building data governance programs, including having business representatives step up to be data stewards. "They need coaching from IT custodians but are able to work pretty well," he said. "This allows for the finding of problems when business is doing their own audits, etc. I have seen it work well in the 20% of clients [for whom] I have either built or audited data governance programs. The ones that worked best seemed to have a great rotation of stewards brought into metadata along with data hubs for managing master data.”
However, Klemenz said, “Some are better than others. Data governance is about getting everyone on the same page regarding data and data usage. Some businesses do this well with governance committees and metadata. Others just use descriptive field names. There is no one way to govern data.”
Hinchcliffe said, “there is vast, growing data sprawl as IT proliferates,” and effective data governance is usually an afterthought. The benefits are seen as indirect, there is no direct ROI, no executive sponsor, and projects are already in progress without guidance. Seidl simply stated, “Data governance is generally not well managed.”
Data governance is especially tough to maintain given the sheer volume of new data and data sources being added at the enterprise level. "Today, most organizations lose the daily battle with master data and data governance as they accumulate an average of two-to-three new IT systems a week," Hinchcliffe said. "Data integrity is a bit better because it's inherent in the testing of most systems. This means automation is the key to broadly effective data governance. Only the unblinking gaze of digital data detectives can continuously track and identify issues and opportunities, and ensure a safety net.”
Related Article: Chief Data Officer: Blossoming Executive or 'Unsettled Role'?
How CIOs Can Make Data Ready and Useful
McBreen also believes automation is key to success. “We built spoke/hub pipes from all applications (shadow or not) that were important to the enterprise," he explained. "That allowed us to automate rules for cleansing and merging data. Where this isn’t the case, it should produce an error database for stewards to resolve.”
Meanwhile, Seidl recommends figuring out what data is truly critical for your organization. "Define, govern and manage it. Get it right, make it valuable. Take a step forward, do it again. Build habits around it, build culture around it. Show more value. Make it accessible. Then iterate. This leads to a whole different conversation about systems of record, and which way the stream should flow. Oh, and how many systems of record you can have before something really weird happens."
For data, Seidl said, prioritize your data and key attributes. Concentrate on them first. Remember it is easier to get buy-in with a win. He suggests conducting quarterly audits, similar to security audits. "It's just as important to know that your data is high quality,” he explained.
Hinchcliffe agreed. "It is important to own the problem and prioritize it," he said. "Issues in resolving the overall data governance issues include risk, expense and ballooned technical debt. Payoffs take 2-3 years, which in today's world is forever.” He recommends you start by making the most critical fixes, including:
- Creating a 360-degree view of the customer for customer experience.
- Evaluating the employee experience in core journeys.
- Fixing key processes for high value system integrations including sales, projects and operations.
- Building a master data graph and making it operational.
High Performers Are Capitalizing on Data Monetization
Once your data foundations are in place, the next step is to monetize your data. Boomi director of enterprise architecture Mark Clifton said, "Every company operating today is a data-driven company. You have access to a bunch of data on your supply chains, operations, strategic partners, customers, and your competitors that can be monetized. Getting data monetization right requires significant effort and is becoming critical for staying ahead of traditional competitors and new disruptors.”
He recommends CIOs start by developing a blueprint that considers the varying data sources, offers a process for recognizing value, presents relevant business models, explores commercialization choices, and points to the various challenges to be addressed.
Getting one’s data house in order is not easy. But the work is valuable to your enterprise and business counterparts. The time to begin the process of making data useful and accessible is now. By taking on this mantle, CIOs can add value to all involved in the data value chain and achieve the long-elusive business relevance they need to get ahead.