The consumer packaged goods (CPG) industry is overwhelmed with disruption. There are 9,000 more products in US grocery stores than there were in 1990, but the average retail store is almost 7,500 square feet smaller. A new product hits the shelves approximately every two minutes. Consumers at an individual level want to tailor their shopping cart to their needs — organic, gluten-free, heart-healthy, sustainable — and the list is growing. All of this means a massive influx of data — from consumer preferences and purchases to insurgent competitors stocking the shelves or building new stores — for retailers and brands to analyze and derive insights from.

To deliver consumers the right products at the right time, businesses need to curate millions of data points and drive insights to inform marketing and sales efforts. Enter AI, which uses these data inputs to better understand how consumers are shopping, why they shop and most importantly, predict what consumers will buy in the future. Based on this, companies are fundamentally shifting how they explore product development cycles, pricing models and an understanding of how to change the minds of fickle consumers.

Don’t Assume AI Insights Arrive from Sound Data

But there’s a catch. AI doesn’t operate in a vacuum. It requires clean data inputs to achieve valuable outputs. Unfortunately, AI can’t always tell good data from bad data, which has inherent biases. So-called “good enough” data requires human inputs to make corrections, and that introduces the chance of more problems and inefficiencies.

Many businesses believe they can operate on "good enough" data, but few realize the high costs of ignoring it. In fact, correcting data after it has been created can be 10 times more costly than implementing upstream controls at the point of data entry.

Solution? Do just that: be proactive by prioritizing data quality.

Related Article: What Data Will You Feed Your Artificial Intelligence?

Identify Data Outputs, Quality Metrics and Consequences

First, identify what outputs will be required of the base data and what decisions will be made on those outputs. For example, are you marketing cage-free eggs to millennials? Testing out a pricing adjustment for a software application? Releasing a new line of hard seltzers geared toward a primarily wine and spirits drinking population? Before rolling out each initiative, businesses should be taking a more targeted view of their big data, turning them into local, manageable and personal data sets that can be easily activated.

Next, you need to have concrete metrics for what qualifies as clean data, e.g. the accuracy, completeness and aggregation of the data. If you’re a retailer and you’ve decided to stock the store with a plant-based meat alternative, you have to have access to thorough research. Examples include consumer preference data in your region that points to the likelihood of shoppers gravitating toward plant-based meat alternatives; sales of similar products already in your store; awareness of what demographics would potentially be interested, recognition of a price point that has been successful for other stores with a similar consumer base; and marketing accordingly (via in-store, email and snail marketing to targeted consumer database). Data needs to be accurate, but beyond that, it needs to be thorough — accounting for the many variables that reflect consumer purchases.

And if there are issues with the data, what are the most important attributes that will need correcting? What are the consequences and risks inherent, e.g. how much could sales and growth projections be altered? These are all questions a tech team needs to be able to answer, particularly if these data inputs are powering an AI-based infrastructure that helps track purchasing trends and predict future growth. Ideally, they start with clean data and 100 percent confidence in how it is expressed.

Learning Opportunities

AI only works on data, so having large, diverse and inclusive data sets is crucial. If the data is wrong, all of the AI that follows is wrong.

Related Article: Data Ingestion Best Practices

Augment Human Intelligence with Data Science

Neither AI nor human (data scientist) intelligence and capabilities meet their potential in isolation — they need each other. We see this with data quality analysis.

An increasing percentage of data is created passively, as consumers interact with technology. Passively-generated data has many pitfalls, such as selection bias, misattribution, compliance and missing data. AI unlocks the value of big data, but it requires data expertise to understand what pitfalls are being solved for, and a truth set against which to train and benchmark. Without the data expertise and the truth set, there is a great risk of applying an inefficient algorithm that ultimately generates misleading insight and incorrect predictions. In today’s fast moving world of consumer preferences and retail dynamics, no CPG company can afford a misstep in their product innovation, distribution, pricing strategy or promotions.

Some data sets also go stale: their accuracy or efficacy diminishes over time. We need to think about data as a truth set and use technology and real people to calibrate those truth sets. Increasingly the marketplace will look for trust in data and transparency in how it is collected, cleaned, codified, aggregated, permissioned and ultimately used.

Fine wines get better with age. Data does not. Data quality and transparency in its collection and development should never be optional — making the deployment of data scientists a requirement and not a luxury. It’s essential to identify problems and take proactive measures to better manage your data within a larger data strategy — or spend an exorbitant amount of time and money rectifying your mistakes.