“Hmmm, okay.” He looked around inside the car a few more seconds, and said, “Yes, this will be fine.”

I thought to myself, “It will be fine? That’s all you have to say?” I knew my colleague was a man of few words, but at that moment, I felt a little like Babe when Farmer Hoggett told him, “That’ll do, pig. That’ll do.”

My Babe Moment

This happened several years ago. My team at work hosted a successful car wash to raise money for a charity event. In spite of the fact that we raised a lot of money, what I remember the most was my Babe moment.

It began when the Toyota Camry pulled up. The owner wanted both the inside and outside of the car cleaned, and I was on interior car duty at the time. The exterior was unremarkable. Then I opened the driver’s door.  

I don’t know what got to me the most: the beige-turned-solid-black-from-years-of-dirt-and-grime steering wheel; or the petrified McDonalds fries and pulverized Goldfish ground into the carpet; or maybe the dried-up slime on the windows. I’m not a germaphobe, but I do like clean — and this, well, I had no words.

I was going to do right by this car. Over two hours later, a newly-detailed Camry emerged. It looked fabulous, and I knew my colleague would appreciate the blood, sweat and tears that went into restoring his family’s car.

Big, Dirty Data and Babe Moments

I’m sure my colleague never set out to have a (really) dirty car, just as most organizations don’t aspire to collect and store inaccurate, incomplete or erroneous data — otherwise known as dirty data. But let’s face it, having a dirty car or dirty data is inevitable. How you choose to deal with that dirt, however, is what will set you apart from the masses.

If you’re a data professional, this discussion about dirty data is not new, it’s just renewed. With the advent of big data, this discussion can no longer be ignored, minimized or left to deal with later when you have more time. Because that time will never come. Big data is ushering in a lot more data from many more data sources at a much faster rate than ever before. And that, in turn, means a whole lot more dirty data to deal with. It’s inevitable.

Has your organization started to embrace big data yet? If so, here are a few tips to help you deal with your big, dirty data:

  • Prepare for it: Do you have a data governance framework in place? If you don’t, then that’s where you need to start. If you do have a framework, the good news is that you don’t need a separate one for big data. You will just need to extend your existing framework to address big data’s expanding volumes, sources and data types
  • Get the right tools: This is no longer the world of Excel where it’s relatively easy (but time-consuming) to keep your data clean. You will need to work with more sophisticated tools to highlight and address data discrepancies, anomalies and outliers. There are a lot of good tools out there to help you with this
  • Provide visibility to its origin and history: Cleaning up data is an essential step in getting users to trust and use the data for decision-making activities. Another trust-building step is giving users visibility into the data’s origin and change history. In addition, big data technologies have drastically reduced the cost of storing data, so it’s no big deal anymore to store data in its raw, native state — along with its multiple transformed states
  • Dedicate resources to it: This not only includes human resources, but also machine and tool resources. Ideally, you want these resources in place to meet the data when it comes in. Consistency is key

A word of caution: If your organization is struggling just to keep your existing “small” data cleaned up, then get your small data house in order before you bring in the big data. The point is to start small, making sure you have the right tools and processes in place to get the dirt out of your data.

Clean Pays

This past weekend, I traded in my car for a newer, sleeker version of what I already had. Ironically, my car’s interior was soft beige, the same color as my colleague’s Camry. The only difference is that I was somewhat fanatical about keeping my car cleaned — inside and out — regularly.

Was it worth it? Let’s just say that I got better than a private party offer from the dealer. Why? Because my 3-year-old car was cleaner than some of the new models on the lot. The only thing missing was that new car smell.

Bottom line: Clean pays, but it’s a lot of hard, often thankless, work — whether you’re talking cars or data. And keeping it clean is not a once-and-done project: it’s a regular, ongoing activity that will set you apart from the masses. Dare to be clean.

Creative Commons Creative Commons Attribution 2.0 Generic License Title image by  ** RCB **