signs pointing everywhere and nowhere
PHOTO: Daniele Levis Pelusi

Data is the new oil, but countries like Venezuela have plenty of oil but cannot get it out of the ground. What good is oil if you can’t use it? 

That, in essence, is the data match rate problem.


The data match rate problem occurs when you have data sitting in one system and need to get it into another. For example, in marketing, customer data may need to go from a customer relationship management (CRM) system to a data management platform (DMP) for audience segmentation, from a DMP to a demand-side platform (DSP) for media buying, or from a DMP into a dynamic creative optimization (DCO) system for personalization. Match rates are how often two systems can recognize the same user so they can share user data.

Marketers often complain about their data match rates and ask vendors to explain why they aren’t as high as they should be. However, underlying these issues are two different data match rates that relate to two different problems and answers: the accuracy of inputted user data and the precision of outputted data.

Related Article: Data Ingestion Best Practices

Why Match Rates Fail #1: User Data Input Accuracy

Marketers collect tons of information about users when they interact with their brands. They can get registration information, like names and phone numbers for signups, as well as email addresses or logins. They get billing information when there is a transaction. They drop cookies on users that visit their sites and collect mobile device IDs when people use their apps.

Here's where that data accuracy goes astray: fake users and multiple identities.

Some people never provide their real information out of fear of being spammed or compromising their privacy. They provide fake emails or phone numbers just to get to the content and, as a result, the CRM now has more users than other systems.

And sometimes the “users” that visit a site are actually bots. Some systems will drop cookies on them, but others may recognize them as fraudulent and have them removed immediately.

In other cases, real people will visit a site and provide accurate information but may register multiple times or register once with different personally identifiable information (PII) than what is in other systems. For example, I may register for LinkedIn with my work email address but give a commercial site my personal email address. Or I may have registered with a system using my old college street address and then re-registered with a new address a few years later — in which case I am re-entered as a new customer in the same CRM system. Now you try to find me twice but can only find me once in a partner solution. If you cannot resolve these identities to the same person, you will not be able to match over the data.

Consequently, your “match rate” is really a function of both your own data input accuracy and how much user data the other system has to match against it. So when you see match rates of 20 percent instead of 50 or 60 percent, the questions you can investigate and try to answer yourself are: How much of our user data is verified, up to date and reliable? And, do I have a solution for resolving multiple identities to an individual?

This type of match rate matters when it comes to maximizing reach and scale in targeting and measuring specific people, but it does not lead to waste in people-based marketing because you won’t spend money on media or measurement for people that do no match.

Related Article: Data Drift: What It Is and How to Avoid It

Why Match Rates Fail #2: User Data Output Precision

Having far more impact from a budgetary standpoint is the match rate between systems downstream of the initial CRM onboarding that creates targetable audiences. At this point, if you’re paying for technology or media based on using those audience segments but cannot actually pass on information about the specific user, you are wasting a lot of money.

Where the first match rate problem is a data accuracy problem, the second match rate problem is a data precision problem. Accuracy is comparing the closeness of a measured value to a standard or known value (is the uploaded PII accurate?). Precision is comparing closeness of two or more measurements to each other; in this case, precision is a measure of how often two systems can agree when determining if a user is a unique, known person.

The reason it isn’t as simple as copying, pasting and calling it a day is that you’re trying get data on a specific person from one system to another system when there are millions of people. How do you know which John Smith gets which John Smith’s data from the other system? How do they even know they are talking about the same person? The ability to resolve this is the answer behind your data “match rate.”

Most programmatic systems like DSPs, DCOs and DMPs, need to ID-sync, oftentimes using “cookie syncing.” That means they both drop cookies on users on the same ad call by being coded into the ad tag. By dropping it at the same time on the same impression, they can simultaneously tell each other in a separate connection what user ID they have for that exact impression and then line up their IDs so they have an ID matching table between systems.

The problem is that the scale or reach between systems is often mismatched. If users get cookied by one system but not another (because of inconsistent pixel setup on a marketer website, different cross-device graphs, different persistency of cookies, etc.), then there is a “default” or “no match.” Systems with a large scale will see and cookie the most people, while those that see limited impressions across their customers will be less likely to match all incoming people to people already present in the system.

Consequently, a DMP may track 10 million people interested in buying a car from all the publishers and marketers it works with, but when it tries to sync with the DSP so the DSP can go buy those people in programmatic auctions, the DSP may only find 7 million of those people, since it hadn’t seen the other 3 million people at the same time as the DMP. So a marketer could be paying for an expensive DMP to store all this data but not be able to use it in its media buying (DSP) or creative personalization (DCO) if they don’t have solutions that can match with each other well.

In the age of people-based marketing, brands and agencies need to think of how often they are able to recognize a user and do something about it. Match rates matter. While getting to 100 percent is the goal, focusing on the percentage and not the underlying reason and situation will not tell you how to improve.