number one on a sign
If you don't use the data, it's not doing anyone any good PHOTO: Agence Producteurs Locaux Damien Kühn

Early in my career, I interviewed for a software developer position and was asked, “What makes good software?” Being young, I mentioned the usual list of desirable attributes: A system should be bug-free, have a nice user interface and make accurate calculations, etc. 

After listening to me fumble along like that for several minutes, the interviewer took pity on me and stepped in to provide an answer that I consider to be one of the more profound and insightful answers to that question: “Good software is software people use.”

New Question, Same Answer 

In today’s data world, the question to ask is, “What makes good data?” To me, the context has changed (software to data), but the answer remains the same: Good data is data people use.

What makes data usage such an effective metric? Two reasons shine through. First, usage is a strong proxy for value: Folks don’t use data that doesn’t help them. Second, usage is measurable in real time and is attributable to specific actions taken, so you get immediate feedback when you add a data set, provide training or implement a new model.

If data usage is our metric, what are the ways we can improve it? My four favorites are to answer a business problem, make the data accessible, focus on quality over quantity and measure at the right altitude.

Answer a Business Problem

We use data to answer questions and guide decisions. So when you’re working to improve data, focus on those questions and decisions. Too often, organizations launch data projects with no crisp purpose. It is data for the sake of data. This has been especially true of many recent big data platform projects: Enterprises begin these journeys on the belief that they will lead to the end of the rainbow.

If we know the problem, we know how the data will be used. If we know how the data will be used, we guarantee that our usage statistics will go up.

Make the Data Accessible

Taking steps to make the data readily accessible will have an impact throughout an organization. Can the data engineers effectively capture the data feed? Can the data scientists and developers access the data however it is stored. Can executives and analysts effectively interact with reports or raw data as needed?

To state the obvious: If data is inaccessible, it can’t be used.

Focus on Quality Over Quantity

Recently, we have seen a bit explosion in the volume of data. But increasing the amount of data you collect is the easy part; the data itself is low-hanging fruit and isn’t that important on its own. What is important is the information the data provides. More data does not necessarily equate to more information.

My favorite example for highlighting this difference occurs in model development. Which would you rather have? Hundreds of variables that could be used to create the world’s best model for predicting tomorrow’s closing prices of the stocks in the S&P 500 Index, or tomorrow’s closing prices of the S&P 500 stocks?

By focusing on quality, we ensure that the data we operationalize won’t just take up space in a data lake — it will be data that people will use.

Measure Usage at the Right Altitude

Every organization is different. Indeed, every project is different. Sometimes a dose of common sense is better than blindly following a regimented approach when evaluating usage.

I once led the development of 180-page weekly report. In any given week, 99 percent of the report was ignored. Critically though, two pages of the report were needed, and they changed every week. The project was a usage success because, thanks to those two pages, the report was needed on a continual basis. An evaluation of the report that looked at every single item in the 180 pages would not have reached the same conclusion.

Good Data Is Data People Use

Tracking data usage provides an effective real-time tool for measuring the success of data initiatives. Once in place, a system of tracking data usage helps identify the potential of data projects and provides data teams with a clear path for improving how they operationalize the data of the enterprise.