How Big Data Projects Are Different

How is big data strategy different from any other technology-related strategy? There’s definite overlap with other IT strategies, including the need to be aligned with business strategy, to have strong sponsorship, to address specific business problems, and to have decision making mechanisms for resource allocation and ongoing capability development -- that is, good governance processes.

The key difference lies in what makes Big Data unique -- the implications of the classic 3 v’s of Big Data -- the volume, velocity and variety.

Big Data represents a combination of traditional foundational capabilities with the added dimensions of parallel processing of large amounts of data that is coming in at a rapid rate and changing quickly, and the need to normalize that data in some way, because it is coming from multiple and diverse sources. As in any strategy, the first order of business is… business.

The first question to ask is: What is the business justification for the initiative? Many companies launch big data programs and projects without a clear set of goals and objectives. Organizations realize that they need certain infrastructure in order to stand up a Big Data capability, but doing so without an end goal in mind will invariably cause mistakes to be made in tool selection, system configuration, and data curation.

The challenge in establishing business objectives is that the business side (and sometimes the IT side) does not necessarily understand what Big Data can do. Asking business users what they want will lead to ambiguous answers or blank stares.

Big Data initiatives need to be focused on a problem to solve and a set of hypotheses about solving that problem -- the scenarios where insights are to be gleaned and the data sources and variables that can be manipulated in order to test outcomes. What problem is the business struggling with? What is the business impact of the problem? What are the ways in which related processes are measured? What existing sources of data are currently used as inputs? Are there new sources of data that can inform the solutions? How can various permutations of the solution be tested? How can the inputs and conditions be varied to test possible solutions and the hypothesis?

How Big Data Projects are Different

What’s truly different? Walmart has been mining large amounts of structured data for sales trends and pricing impact. But this is not the true definition of Big Data. This task could be accomplished through traditional relational database and business intelligence techniques – albeit with large hardware requirements. As Walmart added in information from surveys or on-line clickstream behaviors – two sources of unstructured or semi-structured data -- it crossed over into the realm of Big Data, because it then had the classic mix of the three V’s: large amounts of fast changing data from heterogeneous sources.

A Big Data Example Scenario - Optimizing the User Experience

An organization measuring the impact of social media in a marketing mix might have the following inputs: Customer satisfaction scores, likelihood to recommend, social media sentiment, marketing measures such as email clickthroughs, website visits, conversions, abandonment rates, sales transaction metrics across categories, and customer demographics. The organization might correlate participation in social media marketing using tracking URLs for attribution and determine correlation with uplift on the ecommerce site. This mix will have a number of inputs, since shopping conversion is only the last step in a potentially long series of activities and interactions, each of which strengthens or weakens the brand and the relationship.

Several important questions can be asked and answered: What content is most effective in engaging across various social media vehicles? How does variation in tone (humorous versus factual for example) impact the next behavior? How does that behavior change according to product line, web channel, specific web property, demographic segment, geographic territory, or user intent? The overall hypothesis consists of a range of questions about the impact of campaigns using a variety of methods for touching, attracting, engaging, converting and retaining customers.

A key element of a Big Data strategy is the quality and provenance of the data sources. Without harmonized data and consistent metadata, a great deal of work needs to go into cleaning up, performing extraction, translation and load (ETL) functions, and making data usable for integration into analytic models.

The model can range from a simple one with few dimensions to multi-dimensional statistical models. The model requires data as inputs, operates on that data based on various choices that modeler makes in order to test the business theory, and produces an output that is either a visualization or a recommendation of some sort. At the end of the analysis the purpose is to make a decision (sometimes automatically and dynamically): Offer a product at a particular price; discount a product for a certain time period; or offer a product for upsell. These decisions are all part of the initial hypotheses -- the use cases for analysis.

The Challenge Ahead

Begin your Big Data strategy with an understanding of the goals of the business. Identify the specific conditions that can be varied as inputs, and produces an actionable answer that collectively or individually guides action. At the collective level, that action could be an allocation of resources for a particular channel or marketing vehicle. At the individual level, that action might be making a recommendation for a related item for purchase through analysis of spending patterns by individuals with similar shared characteristics and in the same context. The challenge lies in relating hundreds of characteristics and variables over very large and varied data sets and doing so in near real time.

Creative Commons Creative Commons Attribution-No Derivative Works 2.0 Generic LicenseTitle image by  Roland C. Vogt