Databricks Spark Could Light SAPs Fire

SAP HANA seems to have taken a bit of a public beating lately, namely because its creator, Vishal Sikka, and several other notable executives left the company.

And while some might speculate the Guinness World Record setting in-memory database has had its best days, there are very few facts to support that contention. In fact, we say the best may be yet to come.

After all, SAP HANA hasn’t yet infiltrated most Enterprises and SAP, as a whole, has become no holds barred, cloud-bound only lately.

Just Getting Started

Couple that with big data, the kind that requires Hadoop for handling, then it's reasonable to say that it’s only now coming into the Enterprise and into the cloud.  Less than 10 percent of Global 500 corporations are using it in production.

And while SAP HANA has integrated with various Hadoop distros for some time, its relationship with Databricks, which makes Hadoop more valuable, and Apache Spark is now in a nascent stage — but could grow into something that provides insights the likes of which we haven’t seen before.

The possibilities here could be disruptive because contextual data (social, predictive, machine learning, text, graph and geospatial analysis — most of what we call big data) and real time operational data (data from ERP, CRM, supply chain and inventory management) have seldom been looked at together.

“The union of Spark and HANA is designed to change that,” writes Databricks’  Arsalan Tavakoli-Shiraji, in a blog post.

First a Little Catch-Up

For anyone who is not already familiar with Databricks, it’s a next generation big data platform that delivers faster, easier, and more sophisticated big data processing in Hadoop clusters. It consists of a single framework for data streaming, graph processing, machine learning, etc., keeping users from having to manually set up and integrate everything for themselves.

Apache Spark, on which Databricks is based, may very well be the most active Apache project in the world (in terms of contributors), which means that it is growing and getting even more useful  at an incredibly rapid rate.

Announcing Databricks Cloud

Databricks (the company) just announced a Spark-based Cloud platform, Databricks Cloud, which helps companies get value out of big data more quickly by eliminating the challenges associated with infrastructure.

 This frees developers to focus their time and energies toward building end-to-end analytics applications and makes data scientists’ work less of a hassle.

At the end of the day, Databricks’ helps companies get more value out of big data more quickly.

Databricks + SAP + Cloud = Smarter Enterprises

SAP just announced a Databricks-certified Apache Spark distribution for the SAP HANA platform. It’s downloadable free of charge from SAP HANA’s site.

The companies say that they will work in partnership to bring together two powerful technologies to better enable enterprises to derive value from their data.

Databricks, in its blog post explains what the synergy means:

Beyond their individual capabilities, the true power of this integration is the ability of Spark and HANA to work closely together. Rather than performing a simple ‘select *’ query to grab a full data set, Spark can push down more advanced queries (e.g., complex joins, aggregates, and classification algorithms) – leveraging HANA’s horsepower and reducing expensive shuffles of data. A similar mechanism works for HANA users, where TGFs (Table Generating Functions) and Custom UDFs (User Defined functions) provide access to the full breadth of Spark’s capabilities through the Smart Data Access functionality.”

The beauty of Databricks and SAP HANA tools working together is that businesses will be able to leverage information from different domains and have the insights they need to work smarter.

Retailers, for example, will be able to integrate things like social media trends and inventory analysis, healthcare providers will be able to make better staffing decisions by integrating patient data with epidemiological information, and utilities will be able to combine sensor data with billing systems to deliver personalized resource and cost-saving recommendations.

Taken together, Databricks’ and SAP HANA can provide the tools and information that enterprises need to act smarter and make better decisions.

Title image by Ju1978 /