The Gist
- Enhanced transparency. The Data Provenance Standards ensure clarity on the origins and usage of data in AI applications.
- Cross-industry collaboration. Diverse industries collaborate to develop standards that promote responsible data use.
- CX implications. New standards aid CX leaders in managing customer data more effectively, enhancing trust and personalization.
Back in November 2023, the Data & Trust Alliance (D&TA) announced eight new standards that bring transparency to dataset origins for data and artificial intelligence (AI) applications.
Now, after testing and validation with more than 50 organizations inside and outside of the Alliance — including IBM, Walmart, Pfizer and others — the D&TA has released version 1.0.0 of its Data Provenance Standards.
Why Introduce Data Standards?
In the race to adopt AI, members of the Alliance, along with other businesses, sought out better rules around data quality.
“AI is all about the data. In fact, data may be the only sustainable source of competitive advantage,” said Rob Thomas, SVP, software and chief commercial officer at IBM and chair of the D&TA Data Provenance Standards initiative.
There is little transparency around the data that trains and feeds AI models. And the consequences, according to the Alliance — such as copyright infringement and questions around privacy and authenticity — could impact the technology’s business value and its acceptance by society.
In fact, according to a recent IBM survey, 61% of CEOs say lack of clarity on data lineage and provenance is a top barrier to adoption of generative AI.
Related Article: Generative AI Might Be Slamming Right Into a Resource Wall
What Are the Data Provenance Standards?
Back in November, the D&TA originally proposed eight standards. Now, after gathering feedback from small- and medium-sized enterprises, validation and testing, Version 1.0.0 of the Data Provenance Standards contains 22 metadata fields grouped into three standards, with that metadata intended to travel with the dataset as it’s shared and transformed.
The three standards are:
- Source: Identifies the origin of the current data set, including dataset name, unique URL, dataset issuer and description of the dataset.
- Provenance: Concerns the data origin geography, dataset issue date, range of dates for data generation, data format and more.
- Use: Covers the intended use of the data, including confidentiality classification, license to use, proprietary data presence and more.
As technology and AI transform industries, organizations need a blueprint for evaluating the data that fuels these algorithms, said Christine Pierce, chief data officer, audience measurement at Nielsen.
“Through the collaboration of experts across multiple industries and disciplines, the D&TA Data Provenance Standards meet this need,” she explained. “The standards promote trust and transparency by surfacing critical metadata elements in a consistent way, helping practitioners make informed decisions about the suitability of data sources and applications.”
Who’s Behind the Data Provenance Standards?
The D&TA Standards were built by a working group of chief data officers, chief information officers, leaders in data strategy and other practitioners across more than 15 industries. These Alliance companies include:
- AARP
- American Express
- Deloitte
- Howso
- Humana
- IBM
- Kenvue
- Mastercard
- Nielsen
- Nike
- Pfizer
- Regions Bank
- Transcarent
- UPS
- Walmart
- Warby Parker
“Safe adoption of future AI tools will require trust and transparency in the data powering them,” said Thomas Birchfield, technical program manager at Transcarent. “Cross-industry collaboration toward a universal set of data provenance standards is a key component of leveraging data effectively and responsibly.”
Related Article: AI Trust Issues: What You Need to Know
What Do the Data Provenance Standards Mean for CX Leaders?
These new standards aren’t just changing the game for chief data officers and chief information officers. They also introduce new implications for marketing executives and customer experience leaders.
D&TA’s Data Provenance Standards increase transparency into who and how customer data is collected, stored and utilized before it even enters the organization, according to Kristina Podnar, senior policy director at Data & Trust Alliance. “This fosters trust with customers, which is critical for organizations — specifically CMOs and Chief Customer officers/VPs of contact centers — who are increasingly held accountable for the appropriate handling of customer data.”
Beyond providing visibility into the type of data you’re acquiring, the standards also highlight potential risks, said Podnar.
“For example, if you are acquiring AdTech data for a new product launch, but a large percentage of the data is lookalike or generative synthetic data, it could skew your product targeting strategy,” she explained. “If this data is further ingested into AI models within the enterprise, they could collapse the AI model over time, thereby increasing legal and reputational risks and loss on investment.”
From a tactical perspective, she said, these standards ensure data is appropriately sourced and maintained, allowing marketing and CX leaders to create more personalized, contextually relevant customer experiences. “This, in turn, makes consumers feel understood and valued.”
And, in the contact center, Podnar added, reliable data means more effective call handling and issue resolution, reducing time and resource expenditures.
What’s Next for the Data & Trust Alliance?
Some organizations have already begun using the Data Provenance Standards. IBM, for instance, tested the standards as part of their clearance process for datasets used to train foundational AI models. The result? They saw an increase in both efficiency (time for clearance) and overall data quality.
The next step is to increase adoption among other companies. According to the D&TA, many data suppliers and producers shared their feedback on the standards, and now the Alliance plans to enlist these organizations as partners in adoption. They also share the same goal for toolset providers, an effort that could make adoption easier.