Customer Experience Management (CXM), Information Management, Social Business
 
 
 

EMC Intends to Expand Big Data Computing with Greenplum Buy

emc_logo_2009.gif Although influenced by multiple Enterprise CMS passion attacks, in its heart, EMC (news, site) has always been a data storage/management company. Having experience with managing large content/case/records assets in the enterprise is part of the attraction.

With the upcoming acquisition of a privately-held, data warehousing company Greenplum, it's becoming even clearer where EMC's true heart desires lie — on the intersection of big data computing and business analytics.

Current EMC Landscape

Aside from EMC's ECM efforts with:

the company is largely concerned with a lot of the following:

  • Information infrastructure solutions
  • Security through RSA
  • Backup, recovery and archiving viaDataDomain acquisition
  • Virtualized infra (VMWare included)

Why is EMC attracted to Greenplum? Best summed up in big words of Pat Gelsinger, president and COO, EMC Information Infrastructure Products:

The data warehousing world is about to change. Greenplum's massively-parallel, scale-out architecture, along with its self-service consumption model, has enabled it to separate itself from the incumbent players and… shift toward 'big data' analytics.

What Greenplum Brings to the Table?

Well, all things "big" data. And big analytics and business intelligence. Not the FatWire level of analytics. A lot more than that.

As we all know by now, we live in the world beyond terabytes, at least petabytes, moving into yottabytes of data. And it just continues to grow. While this is hardly a "big data" phenomenon anymore, many organizations still are looking for ways to manage those mounts of data efficiently.

And, more importantly, they're looking to be able to analyze that data.

Greenplum has this nifty "shared-nothing" massively parallel processing (MPP) architecture that has been designed for analytical processing in virtualized x86 infrastructures. Coincidentally, EMC’s storage products are x86-based as well.

tech_GPDB_Arch.jpg


Greenplum Architecture

As Greenplum notes:

In this architecture, data is automatically partitioned across multiple 'segment' servers, and each 'segment' owns and manages a distinct portion of the overall data. All communication is via a network interconnect — there is no disk-level sharing or contention to be concerned with (i.e. it is a 'shared-nothing' architecture).

Then, there's a MapReduce integration allowing developers and DBAs to execute both MapReduce and SQL in Greenplum’s parallel dataflow engine. When there's MapReduce, there's an ability to run analytics on petabytes of data — in our outside of Greenplum's database.

parallel-dataflow.jpg


Greenplum Parallel Dataflow: SQL + MapReduce

As you see, data management on its own is only part of the interest. Being able to analyze all your data is yet another advantage.

 

Continue reading this article:

 
 
Useful article?
  Email It      

Related Articles:
Tags: , , , , , , , , , , , ,
 
 
 

Featured Events  View all | Add event | feed RSS

Who's Hiring?  View all | Post a job | feed RSS


 
Are you hiring?    Post your job today ($45 for 45 days)!