A sad businessman scratching his head with a declining sales data background
PHOTO: Shutterstock

Bad data costs the United States more than $3 trillion per year according to the “Data Doc,” Thomas C. Redmond. In a Harvard Business Review (HBR) article published in 2016, he explained it this way, “salespeople waste time dealing with erred prospect data; service delivery people waste time correcting flawed customer orders received from sales. Data scientists spend an inordinate amount of time cleaning data; IT expends enormous effort lining up systems that 'don’t talk.'  Senior executives hedge their plans because they don’t trust the numbers from finance.”

It's no wonder that salespeople have trouble meeting their numbers, marketers often miss their targets and the C-suite doesn't trust their forecasts. This is one of the main reasons so many decisions are made based on a gut feeling

Garbage In, Garbage Out

What is bad data and where does it come from? Constellation Research founder and principal analyst Ray Wang told CMSWire that bad data has several origins. "It can come through data entry. Manual data entry. It's not intentional or malicious. It's just a typo or the numbers entered in the wrong order," he said. Salespeople sometimes create bad data as well, according to Wang, particularly when they "sand bag their forecasts and make guesses without confidence." He added that sometimes data isn't bad, but that it is insufficient and therefore statistically insignificant.

Typically, on any given day, 30-50 percent of the CRM data we encounter is wrong, according to Katie Bullard, Chief Growth Officer at DiscoverOrg. Data entry errors aren't the only problem, according to Bullard. Sometimes “it [the data] is out of date,” she said, noting that decision makers often change jobs without public notice, resulting in personalized and or automated messages, and even deal proposals, that are incorrectly targeted. Marketers don't always enter updated contact information either, she explained, noting that bad data is expensive. "Every bad record costs $11.00. Multiply that by 2000, or more,” she said. But here the context is tactical, using contact information in records versus using multiple, huge, complex data sets for making predictions.

Accurate forecasting without a solid data foundation becomes challenging, according to Mark Jewett, vice president of product marketing, Tableau Software. "Getting rid of dirty data is a must to get to accurate marketing forecasts." The more data that the marketing team has access to, the more questions they can ask of that data, and the more effective they can be at their role within the organization," he explained.

Related Article: Big Data's Hidden Scourge: Data Drift 

How Does This Impact the Business

So what happens when your data is bad? In regards to sales forecasting, Wang said,"well you think you are green and all good when you really are red. All the forecasts show 90 percent of hitting quota and you are at 50. When that happens two weeks before the end of the quarter, you are in trouble," he said.

While DiscoverOrg helps marketers address a good part of the "simple data" problem by providing clean, fresh, basic CRM data including direct-dial phone numbers and email addresses that are verified via a combination of tools and more than 200 human fact checkers, data scientists look to clean what is, often, more complex data at a different level and in a different way.

Build a Clean Data Pipeline

"Every company we talk to is working hard to integrate data and analytics across many aspects of their business, and nowhere is that more true than in marketing," said Jewett. "Bringing data together across multiple sources like Google Analytics, Omniture, DoubleClick, Marketo, and many other data sources is critical to understanding the full marketing pipeline," he added. 

He also noted that making sure that the data is clean and compatible can have a huge impact on a team's results by, "weeding out less reliable data points and helping to filter through the noise to spot the prevalent trends."

Related Article: Data Drift: What It Is and How to Avoid It

Use Data Selectively

But John Timmerman, Teradata's global industry evangelist cautions against using every piece of data you can get your hands on. "The most important thing a marketer can do is to determine the data that actually has a probabilistic influence on forecasts," he told CMSWire. "Just as we hone and tailor predictive models for customer behavior, it is critical to first look at all the data attributes available to determine which ones (or combinations of attributes) actually play a role in predicting an outcome."

Timmerman explained that by excluding the data that is immaterial to your forecast, you can focus data quality initiatives where they are most needed. "It would be a shame to spend valuable data quality resources on a source or attribute (that isn't needed) to the detriment of one that's critically influential,” he said.

The approach data scientists and analysts should take, from Timmerman's point of view, is to "first determine which data points are truly predictive in regards to marketing forecasts. The second step is implementing a governance protocol across all of those data points, sources and attributes to maintain the quality of the information influencing your organizational decisions.  This ensures that "bad data" never enters your analytical ecosystem in the first place," he said.

Wang told CMSWire that using machine learning and statistical  techniques to find historical patterns on how well sales reps have forecasted and then determining that those attributes, can be very helpful. That is why there is such huge investment in artificial intelligence (AI) for sales, he said. But even then, according to Redmond, "Bad data can rear its ugly head twice — first in the historical data used to train the predictive model and second in the new data used by that model to make future decisions," he wrote in a different HBR article.