What is DataOps? DataOps (data operations) is a new approach to data management which brings together data workers — the individuals who collect, clean and prepare data —with data analysts to help enterprises make data-driven decisions at the moment of opportunity. Bringing together DevOps, data engineers and data scientists helps improve communications, bust silos and ultimately, it helps businesses make better data-based decisions.

Data Helps Drives Better Decisions

Companies that use data to make decisions tend to make better decisions and CMSWire has been covering this since 2011, shortly after Erik Brynjolfsson and Heekyung Kim of MIT, together with Lorin M. Hitt of the University of Pennsylvania published a groundbreaking research paper "Strength in Numbers: How Does Data-Driven Decision making Affect Firm Performance?" The scholars provided some of the first large scale data on the direct connection between data-driven decision making and company performance."Companies that characterized themselves as data-driven were 5 percent more productive and 6 percent more profitable than their competitors,"according to the research.

At the time, many saw data-driven decision making as a super-power of C-suite executives who had both IT and “rock star” data scientists at the ready. The latter group could not only curate and clean data sets and craft fancy algorithms, but they were also adept at telling stories and offering insights about what their data is showing.

There was only one problem, there were not enough data scientists and those that existed were overloaded with work. Not only that, but managers who weren't part of the C-Suite needed access to data and data driven insights as well.

Related Article: Data Scientists vs. BI Analysts: What's the Difference?

The Democratization of Data Analytics

A lot has changed since then. Big data has become democratized and analytics are “part of everyone's job" just as Tom Davenport, author of Big Data at Work (Harvard Business School Publishing, 2014,) told CMSWire it would be. With so many different teams and individuals handling data, the need for a data consumption model has emerged.

Defining DataOps

Technology analyst firm Gartner has defined DataOps as "the hub for collecting and distributing data, with a mandate to provide controlled access to systems of record for customer and marketing performance data, while protecting privacy, usage restrictions and data integrity."

Ashish Thusoo, co-author of Creating a Data-Driven Enterprise with DataOps (O'Reilly 2017) offered a more pragmatic definition. "DataOps is a new way of managing data that promotes communication between, and integration of, formerly siloed data, teams, and systems. It takes advantage of process change, organizational realignment, and technology to facilitate relationships between everyone who handles data, be they, developers, data engineers, data scientists, analysts, and/or business users. DataOps closely connects the people who collect and prepare the data, those who analyze the data, and those who put the findings from those analyses to good business use."

To make dataops work, executive management’s mandate for democratized data access, a centralized data infrastructure, data analysts/scientists and data team(s) are required. When these are present, flexible, focused analytics can be produced more quickly and accurately without sacrificing quality and compliance.

Learning Opportunities

Related Article: Knowledge Management and Big Data: Strange Bedfellows?

DataOps Enable Data-Driven Enterprises

Thusoo's approach to dataops and a data-driven culture consists of a data team which publishes data and manages the infrastructure used to publish that data, and line of business/insight-seeking, decision makers who typically have either data scientists or data analysts on their teams. 

DataOps organization hub and spoke model

In Thusoo's model, data scientists and/or data analysts are embedded into business units such as finance, sales, marketing, and so on. They work with line of business decision makers to pinpoint questions, identify the datasets that need to be analyzed which they then translate to SQL (structured query language) or a more sophisticated language. The work is then handed over to the data team.

Other DataOps models, such as the one that Ellen Friedman and Ted Dunning offer in their book, Machine Learning Logistics: Model Management in the Real World (O’Reilly 2017), revolve around “organizing teams around data-related goals to achieve faster time to value.” In their book the MapR executives suggest that DataOps team members can come from product operations, software engineering, architecture and planning, data science, data engineering and product management.

Unlike Thusoo, Dunning and Friedman noted that, "Infrastructural capabilities around data platform and network — needs that cut across all projects — tend to be supported separately from the DataOps teams by support organizations.

The Big Win

Regardless of which DataOps model you choose, and there are probably almost as many flavors as there are enterprises who have embraced DataOps, the aim is the same — eliminating friction so that data can be turned to dollars faster without sacrificing security and governance.