Pentaho, a provider of an open-source business intelligence (BI) platform, has announced its latest release. Like the rest of the information management world, the company is squarely focused on supporting -- yes, you guessed it -- big data.
New Features, New Name
The marketing and technology teams have been busy at Pentaho. The open source BI vendor has a new release and a new name for its BI suite: Pentaho Business Analytics. The BI suite previously had the more mundane moniker "Pentaho Business Intelligence Suite," which is honestly just crazy for a business intelligence suite by Pentaho, right? According to Pentaho, the rebranding better represents the company’s “comprehensive and integrated business intelligence, data integration, data mining and predictive analytics capabilities.”
In addition to a new name, Pentaho’s BI platform has gained several performance improvements, support for in-memory caching and new connectors to big data platforms. One of the biggest enhancements in the release is the support for distributed, in-memory data analysis and aggregation. In the spirit of open source, Pentaho elected not to roll its own in-memory support; the company instead allows users to leverage popular existing in-memory caching platforms Infinispan/JBoss Enterprise Data Grid and Memcached. You can also extend Pentaho’s BI suite to support additional in-memory caching solutions.
You may be wondering why in-memory caching is a noteworthy enhancement. In-memory caching support in Pentaho can substantially improve performance, scalability and availability. In-memory caching allows applications to gain access to data from RAM instead of reading from the disk. This change reduces the load on the database and eliminates the expensive -- in terms of time and resources consumed -- physical read from the database. The use of in-memory caching is common in web applications that have high availability requirements and must serve large numbers of users. Sites such as Amazon, Facebook and Yahoo all have sophisticated caching strategies. Now, in-memory caching is being increasingly applied to data applications that have their own challenging performance needs.
Even if your application isn’t serving as many users as an Internet giant, in-memory caching can make your BI solution faster. In addition, because most modern in-memory caching solutions support distribution, you can support more connections and avoid affecting users during brief database outages, because application data is retrieved from a pool of commodity caching servers. This is a big win for Pentaho users. However, this is not an excuse to throw time-tested design principles out the window. No amount of technical trickery will ever save a poorly implemented analytics solutions.
In addition to the changes to support distributed in-memory caching, Pentaho has expanded its big data tooling. The analytics suite now includes native support for EMC's Greenplum database and Hadoop-based Apache Hive data warehouse. Native support provides improved performance for interactive data exploration and allows users to schedule reports for background execution. Pentaho already included support for a number of other big data tools, including several NoSQL stores.
Getting More Information
If you would like more information about the release, Pentaho is conducting a free webinar, “Deploying Extreme-Scale In-Memory Analytics with Pentaho” on Thursday, November 10. You should be able to register on Pentaho’s event page, but the event was not listed at last check. If you’ve had enough discussion about the release and would prefer to try it out instead, download the latest release.
Do you think Pentaho’s focus on caching to improve processing of big data is the right move? Let us know your thoughts.