More than a thousand big data enthusiasts from all over Europe will gather at the Hadoop Summit at the Dublin Convention Center in Ireland tomorrow and Thursday to hear Hortonworks CEO Rob Bearden and other executives reveal their vision for the future of data.
It's not just about leveraging big data stored in Hadoop or data streaming in Spark anymore, but also data in motion that may never find its way into a data lake.
"Modern data applications won't be driven by Hadoop alone," Matt Morgan, vice president, product and alliance marketing, Hortonworks, told CMSWire.
He offered driverless cars and actuarial decision making applications as examples. Both data-at-rest and data-in-motion play critical roles, he explained. The analysis of historical information accumulated over time provides an invaluable mining repository while data flowing from the Internet of Everything (IoE) offers opportunities to asses and to react in real time.
Powering Modern Data Applications
This is where Hortonworks Connected Data Platform comes into play. It leverages Hortonworks Data Platform (HDP) for data-at-rest and Hortonworks Data Flow (HDF) for data-in-motion. The aforementioned consist of components from open source Apache projects and include no proprietary code whatsoever.
But that's not all that the Hortonworks team will be talking about this week.
Hortonworks hosts the Hadoop Summit. And though challengers Cloudera and MapR (most consider the latter as a competitor, though MapR CEO Matt Mills does not see it that way) will make presentations at the conference as well, they haven't informed us of anything they willbe announcing. Syncsort and Pivotal, complimentary providers, will. More about those later.
Next Generation Big Data Security
Headlining Hortonworks announcements are platform updates including new and reinforced security and governance features for enterprise readiness as well as two expanded partnerships.
But one other announcement deserves the spotlight from our point of view.
Remember the Wall Street Journal report that we called into question, the one which claimed that Hortonworks was working on a proprietary CyberSecurity product?
We said at the time that the much heralded publication had to be mistaken, that Hortonworks commitment to Open Source and their selling proprietary software didn't mesh. Now there is proof of that.
At the Summit, Hortonworks will announce that it is accelerating the development Apache incubated Metron, a next generation Security Incident Event Management (SIEM) platform, together with individuals employed by Rackspace, ManTech, B23 among others.
What’s unique about Metron is that it enables at scale ingestion, correlation, integration and processing across the full stack of applications, system log files, the network and so on.
"We believe that modern threat detection can't take the time to move across 9-10 tools," said Morgan. With Metron, that won't be necessary. If all goes well, the Hortonworks Connected Data Platform will feature first of its kind security.
The Edge of Innovation
Last month Hortonworks announced a new distribution strategy designed to meet enterprises at their own pace rather than requiring them to move to new releases and adopt new functions and features every time they become available.
As a result, customers can choose between Hortonworks’ Core annual updates to Core Apache Hadoop components (HDFS, MapReduce and YARN) and Apache Zookeeper aligned with the ODPi consortium and the edgier Extended Services (including Spark, Hive, HBase, Ambari and more) and Extended Services. Core will be logically grouped together and released continually throughout the year to match the pace of innovation occurring within each project team in the community.
There is little doubt that the new release of Extended Services will be the talk among enthusiasts. It includes both new capabilities and significant improvements born of the decade or more of experience Hortonworks engineers have working with Hadoop and its ecosystem of products.
There is also first-of its kind integration of Apache Ranger for security and Apache Atlas for data governance. Together they empower customers to define and implement dynamic classification-based security policies. Atlas is leveraged to classify and assign metadata tags which are then enforced through Ranger to enable various access policies. Atlas also provides cross-component lineage, delivering a more extensive data view across components. This is available in technical preview starting tomorrow.
A new release of CloudBreak for provisioning Hadoop in any cloud will also be introduced. It expands support for OpenStack for private cloud and Windows Azure Storage Blob (WASB) for Microsoft Azure, and comes with the ability to run scripts either prior to or after cluster provisioning.
There's also the forthcoming release of Apache Ambari that everyone has been asking about. In its final technical preview it features pre-built dashboards for HDFS, YARN, Hive, and HBase with key performance indicators for cluster health.
Finally, visualization should become a cinch for Data Scientists via Apache Zeppelin, a browser-based user interface that provides notebook-style capabilities. In its final technical preview, it provides customers with an agile analytics user experience for Apache Spark running on a secure Hadoop cluster.
Partnerships Are a Big Deal
While partnerships are important to tech firms as a rule, they're key to Hortonworks' business model which counts on associates to help make HDP the platform of choice among enterprises.
You can bet that there will be big smiles on everyone's faces as EMC- VMWare spinoff Pivotal and Hortonworks jointly announce that Pivotal will standardize its Hadoop offering on HDP. (This is similar to the relationship Hortonworks has with Microsoft on HDInsight.) In addition, Pivotal and Hortonworks will jointly go to market with an upgrade program for existing Pivotal Hadoop Distribution customers so that they can take advantage of HDP. Hortonworks will also provide support services around the platform.
Getting data into your data lake can be a cumbersome task but it's vital for enterprises that want to leverage big data analytics.
Tomorrow Hortonworks will announce a partnership with Syncsort to help mutual customers (or soon to be mutual customers) accelerate ETL (extract transfer and load) from mainframes and onto Hortonworks' Data Platform. With the integration, there will be no need to rewrite complex processes or mappings, allowing customers to quickly migrate important data into a cluster while retaining the data’s integrity and provenance. As part of the agreement, Hortonworks will resell Syncsort's DMX-h product line.
Enabling the Future of Data
It is needless to say, that there is a lot here to digest, but the updates bring with them great opportunity for companies who are ready to grab the elephant by the trunk and ride it to the crest of big data's wave.
Title image courtesy of Hortonworks