There's gold in them there hills.

If the aforementioned sounds like one of those corny, tired statements we were using when writing about the emerging big data and Hadoop market nearly five years ago, it is. Except that now we're referring to InfoArchive, one of the products in EMC's Enterprise Content Division's (ECD) portfolio, which also includes Documentum and EMC Leap.

Jeroen van Rotterdam, EMC ECD's CTO, told CMSWire that InfoArchive revenues increased, on a quarterly basis, between 100 percent and 400 percent year-over-year between the fourth quarter of 2015 and the second quarter of 2016.

Compare that to the growth of EMC spinoff Pivotal, which EMC CEO Joe Tucci and prospective EMC owner Michael Dell can't stop talking about. Its revenues advanced 56 percent year-over-year (and 200 percent in year-over-year growth in annualized recurring revenue).

Mum on InfoArchive

So why is no one talking about InfoArchive's results — or the fact that only two years after it entered the market it has emerged as a Gartner Magic Quadrant leader next to products like Informatica, IBM Optim, HPE Autonomy and ahead of OpenText?

While some may argue that indexing, protecting and migrating data in content management, secondary databases or flat files to lower-cost storage for policy-based retention isn't very compelling, here's what is: mining insights from that data.

Van Rotterdam told CMSWire that while companies generally buy InfoArchive for reasons of compliance and cost reduction, they later discover something as least as valuable: by bringing data together from multiple sources and exposing it to data scientists, entirely new insights might be gleaned.

This isn't a theory. InfoArchive is already being used for those purposes in some financial services, healthcare and life sciences companies, van Rotterdam said.

What Analysts Think About InfoArchive

While InfoArchive isn't the only product of its kind, it is the new kid on the block. Forrester analyst Cheryl McKinnon said it opens the door to a number of compelling use cases, some of which many data lake (aka Hadoop) solutions don't address well. 

"InfoArchive, because it was initially built alongside Documentum, has a governance layer that can be used to apply retention policies, legal holds, records management and the like", thereby providing more access to more types of data," she said.

McKinnon also hinted that InfoArchive might take off in a big way as regulated enterprises start moving to the cloud because it knows which data must be retained even though the application is no longer in use.

Learning Opportunities

Digital Clarity Group analyst Alan Pelz-Sharpe said InfoArchive is "simple but quite brilliant" because it allows you to actively archive multiple sources.

"Data doesn't just go to InfoArchive to die — it can still be accessed," he said, explaining that most archives are very static. Once data goes into them, it is still possible to retrieve it, but "far from easy." He also noted that "most archives are singular, not unified, so you end up with lots of silos. Here (with InfoArchive), everything (no matter how different) can go to one place and be accessed in one place."

InfoArchive Goes Open Source

This week EMC ECD raised its play by open sourcing something it is calling InfoArchive SIP SDK. The tool promises to make it easier for companies to implement InfoArchive and glean benefits more quickly.

A SIP (Submission Information Packet), in this case, is a package containing two XML files; the first contains archive data or metadata for unstructured content with links to unstructured content files, and the second is a manifest holding information about the archived data. SDK, of course, is a software development kit.

By making this code available as open source, developers can grab it and go. They can also contribute to InfoArchive SIP SDK, imagine new possibilities, and join a community around the product.

But that's not the only news van Rotterdam seemed interested in conveying. He also pointed out that both he and the management at EMC’s ECD are actively leaning into open source, a posture that still seems new to those who have known EMC as very proprietary tech company that keeps everything close-to-the-vest.

Those days are numbered, of course — not only because the Dell/EMC takeover deal could close as soon as later this month, but also because EMC ECD seems destined for a future independent of either company. 

Title image Hidden Gems in Pebble Creek (Public domain)