emc_logo_2009.gif
Although influenced by multiple Enterprise CMS passion attacks, in its heart, EMC (news, site) has always been a data storage/management company. Having experience with managing large content/case/records assets in the enterprise is part of the attraction.

With the upcoming acquisition of a privately-held, data warehousing company Greenplum, it's becoming even clearer where EMC's true heart desires lie -- on the intersection of big data computing and business analytics.

Current EMC Landscape

Aside from EMC's ECM efforts with:

the company is largely concerned with a lot of the following:

  • Information infrastructure solutions
  • Security through RSA
  • Backup, recovery and archiving viaDataDomain acquisition
  • Virtualized infra (VMWare included)

Why is EMC attracted to Greenplum? Best summed up in big words of Pat Gelsinger, president and COO, EMC Information Infrastructure Products:

The data warehousing world is about to change. Greenplum's massively-parallel, scale-out architecture, along with its self-service consumption model, has enabled it to separate itself from the incumbent players and... shift toward 'big data' analytics.

What Greenplum Brings to the Table?

Well, all things "big" data. And big analytics and business intelligence. Not the FatWire level of analytics. A lot more than that.

As we all know by now, we live in the world beyond terabytes, at least petabytes, moving into yottabytes of data. And it just continues to grow. While this is hardly a "big data" phenomenon anymore, many organizations still are looking for ways to manage those mounts of data efficiently.

And, more importantly, they're looking to be able to analyze that data.

Greenplum has this nifty "shared-nothing" massively parallel processing (MPP) architecture that has been designed for analytical processing in virtualized x86 infrastructures. Coincidentally, EMC’s storage products are x86-based as well.

tech_GPDB_Arch.jpg


Greenplum Architecture

As Greenplum notes:

In this architecture, data is automatically partitioned across multiple 'segment' servers, and each 'segment' owns and manages a distinct portion of the overall data. All communication is via a network interconnect -- there is no disk-level sharing or contention to be concerned with (i.e. it is a 'shared-nothing' architecture).

Then, there's a MapReduce integration allowing developers and DBAs to execute both MapReduce and SQL in Greenplum’s parallel dataflow engine. When there's MapReduce, there's an ability to run analytics on petabytes of data -- in our outside of Greenplum's database.

parallel-dataflow.jpg


Greenplum Parallel Dataflow: SQL + MapReduce

As you see, data management on its own is only part of the interest. Being able to analyze all your data is yet another advantage.

And on top of that, there's Greenplum Chorus -- a commercial enterprise data cloud platform that provides the benefits of private cloud computing for organizations of all sizes. Combined with EMC's virtualized Private Cloud infrastructure, it only gets better.

Apparently, Greenplum can deliver 10 to 100 times the performance of traditional DB software at a lower cost. And while Aster Data offers a very similar product; and ParAccel (one of Greenplum competitors) may as well file for a divorce from EMC -- EMC chose Greenplum.

As outlined in a blog post by Chuck Hollis, VP and Global Marketing CTO at EMC, some of the biggest value propositions may include:

Simply put, data computing is a great use case for a private cloud....

And, finally, let’s not forget the seductive appeal of running on-demand business analytics as yet another fully virtualized workload use dynamic resources in a private cloud model...

Interesting to note that companies like Skype, Equifax and T-Mobile are existing Greenplum's customers.

Big Deal?

Not that big. We think that the estimated value of Grenplum is at about US$ 100+ M, considering that VC funding raised in its 7-year-old history was at about US $61M.

And that, according to EMC, "the acquisition is not expected to have a material impact to EMC GAAP and non-GAAP EPS for the full 2010 fiscal year." The deal is expected to be an all-cash transaction with a closing date in Q3 2010.

Upon completion of the acquisition, Greenplum will be turned into a new data computing division as part of EMC's Information Infrastructure business. Sounds like a data warehousing appliance is on the way?

Despite being featured as a magical visionary in Gartner's DBMS quadrant, is Greenplum under the EMC wing a threat to the DBMS market? Unlikely. There's still too much of "big" data for a handful of vendors to chew on. Oracle, Teradata, SAP/Sybase, IBM and Microsoft will help share the (consolidating market) platter. Even though Greenplum is so optimized for large-scale data warehouse ventures.

Nevertheless, 'tis the time to get a handle on that big data of yours.