Open source business intelligence provider Pentaho (news, site) has announced that it is supporting substantially more big data sources natively such as the latest distribution of Hadoop and many other NoSQL sources. The expansion of data sources should make it easier for Pentaho’s users to connect these repositories and analyze their ever-expanding stores of data.

A Move Forward for Big Data Analytics        

There has been a significant growth in both data sizes and open source adoption in many enterprises. However, as both increase many technology departments are challenged with trying to figure out how make their mesh of tools integrate. The equation is even more complex for those that have elected to leverage tools in the notoriously complex NoSQL market. Pentaho has made that a little easier.

The company has expanded its big data integration capabilities to support most the major NoSQL, OLAP and OLTP databases, making it faster and less complex for users to create reports and analyze data in these repositories. Pentaho has been focused on big data for some time; they first introduced support for Apache Hadoop and Hive a little over a year ago. However, their efforts didn’t stop. Now, support has been expanded to include not only Apache’s distribution, but also Cloudera’s and EMC’s Greenplum HD. In addition to Hadoop, Pentaho is supporting other NoSQL sources like MongoDB and HBase.

Expanded Support for OLAP, Traditional Data Sources

Although NoSQL is sexy now, not all organizations that have big data needs are ready to adopt NoSQL tools. These companies rely on more traditional OLAP repositories that existed before NoSQL became a buzz word. Pentaho’s business intelligence suite supports these organizations as well. The latest enhancements include native SQL generation and native bulk loader integration for a number of popular OLAP tools like EMC Greenplum, IBM Netezza and Teradata.

In addition to big data tools, Pentaho also improved the performance of connecting to traditional relational data sources like IBM DB2, Microsoft’s Access and SQL Server by expanding their native integration capabilities.

Why is native integration important for analytics tools? Native integration offers the fastest performance because the software has fewer communication layers to navigate and the most features since vendor specific “tweaks” can be taken advantage of.

Additional information on the expanded big data capabilities are available on Pentaho’s site.