What Good Is an In-Hadoop JSON Database? #StrataHadoop

MapR CMO Jack Norris won’t tell us if an IPO is still in the cards this year for his big data crunching startup.

But he had plenty to share about the news his company is making at Strata + Hadoop World, which runs through Thursday at Javits Center in New York City.

“We’ll be introducing the industry’s first In-Hadoop document database,” he said, noting that MapR-DB can now support native JSON (JavaScript Object Notation).

The in-Hadoop document database can run side-by-side with Spark and its key-value NoSQL database which Forrester ranks as a Leader for “web scale operations that need to scale across thousands of servers and millions of users with extremely quick and optimized retrieval.”

Even More Applications

The win for MapR customers is that they’ll soon be able to support a broader (“the broadest,” according to Norris) set of applications on a single cluster, increasing what can be brought to the data.

Given that 18 percent of MapR customers are running 50 use cases or more on a single cluster, it’s part of a solution that’s likely to be well received.

The developer preview is available for download now. It will become generally available next quarter.

We asked Constellation Research analyst Doug Henschen what the impact might be on other Hadoop distro providers like Cloudera and Hortonworks.

“Cloudera and Hortonworks couldn't and wouldn't do this with HBase,” he said. ”For one thing, Cloudera has a tight partnership with MongoDB. MapR customers go for performance, taking a pragmatic view when it comes to open-source purity."

Different Perspectives

In other words, MapR, Cloudera, Hortonworks walk varied paths when it comes to Hadoop.

“You’re clearly making a commitment to a single vendor when you choose MapR, but the same can be said for Cloudera,” he added.

And Hortonworks? “Even Hortonworks, which stresses it’s 100 percent open source, makes choices that won't necessarily be followed by all of the members of the Open Data Platform Initiative (ODPi) to which it belongs,” said Henschen “You have to be clear on your priorities and understand the long-term implications of your choice when choosing a platform.

“Swapping out Hadoop distributions, no matter which ones would be a messy proposition a few years down the road.”

The Value of Open Source

If you agree with Henschen, then what we’ve been told about open source — that the real value comes from the members of the community who build it —may have little to do with reality.

But Hortonworks uses this as a selling point when it comes to Hadoop2. It boasts about the large number of committers they have on staff and the assumed expertise they have on Apache Ambari — the idea being that he who built it is the best to support it.

Cloudera boasts something similar with Spark. Among Hadoop distro providers, Cloudera has the largest number of committers, four, on the Apache project.

Hortonworks has one. MapR, IBM, Pivotal and the like have none.

That being said, Spark development is overwhelmingly done by a single firm, Databricks, which believes that Hadoop may be redundant and too hard to use. Many Spark users won’t need to touch Hadoop at all, Patrick Wendell, a co-founder of Databricks, told CMSWire during a conversation in June.

That may be why MapR and Hortonworks are taking a “take what you need to serve your customers best” approach rather than placing what could be a big and irretrievable bet. But then again, it could also mean a big win.

A Win – Or What?

At the end of the day though, what is the impact of the MapR news?

It will be a win for MapR customers “who are sold on doing everything they can with that vendor's performance-oriented version of HBase, which is MapR DB,” said Henschen.

It could also remove the shiny finish from Cloudera Kudu announcement.

But NoSQL DB vendors probably don’t need to worry that it will make a big dent in their markets, said Henschen.

Learning Opportunities