kCura Tackles the Challenge of Big DataFor the last few months, we’ve been talking about big data. From social media to customer experience to cloud storage, the enterprise is struggling to manage and leverage copious amounts of data. The real issue for big data isn’t just where to put it all, but rather how to gain access to it when you need it.

How Did Our Data Get There?

We turned to Jay Leib, Chief Strategy Officer at kCura, to help us better understand how to handle unstructured data so that the enterprise can gain visibility and avoid liability.

When we think of data, we don’t often think about how it got there. It isn’t that content is being created inappropriately (although sometimes it’s duplicated unnecessarily), but rather that once it’s created, there’s no direct path to where it lives, for how long and how it may be reviewed. Leib says that there should be a review strategy involved so that the right information is kept, safely and legally.

The Risks of Irrelevant Data

As we implement discovery response programs designed to help us learn from our data so it can be most responsive, tools like predictive coding capitalize on consumerist technologies with which empowered enterprise have become more familiar. But as assisted review begins to act similarly to how songs are selected for us on Pandora or how products are suggested for us on Amazon, it’s important to realize that the consequences of irrelevant data are not created equal.

The flawed analytics on Netflix, which can’t distinguish movies watched by multiple users within the same account (and therefore recommend both Lethal Weapon and Miss Congeniality), may cause frustration. Yet, the same errors are much riskier for the enterprise when suggested documents contain the same keywords but vary contextually.

Leib reinforces the need for companies to balance content creation with inherent legal implications. It’s important not to ignore something that is relevant to one person but not to another. Technologies like predictive coding and text analytics need to conditionally account for content, not just based on how one user uses it, but how many users use it across many variations and situations. For these technologies to be trained to adapt to human behaviors, they must understand behaviors for what they are, not what they could be.

Of course content creation and its subsequent discovery are just pieces of the big data puzzle and don’t necessarily make the amount of data smaller, but they do help to make it more manageable and defensible.