South by Southwest Interactive (news, site) attendees with an affinity for big data, or data geeks as they are affectionately referred to at SXSWI, had plenty to do Sunday and Monday in Austin. Those interested in big data might be a bit more technical than the average SXSWI badge holder, judging by the availability of a few open seats in the sessions -- somewhat of a rarity at the busy, almost week-long emerging technology conference. Big data has found its way to SXSWI, and the few, the brave, the data-enamored are smiling, because they know somebody has to be an expert in managing all of that web and enterprise 2.0 data.

Big Days for Big Data

Sunday morning, instead of enjoying brunch, data enthusiasts enjoyed an early morning panel, “Death of the Relational Database,” by KloudCo CEO Hank Williams. The session focused on the introduction of technologies, such as NoSQL and graph databases, that challenge the supremacy of the 30-plus year old relational database. The focus on emerging big data technologies continued after lunch, with two NoSQL sessions occurring simultaneously: “Embracing NoSQL: Your First Cassandra Project” and the edgily named “Solr Power FTW: Make NoSQL Your Bitch! “ As usual, at SXSWI, the day didn’t conclude with panels and technical hallway banter. Datastax and Infochimps (news, site) rewarded data enthusiasts with the Data Cluster Party. Momentum continued Monday with the “Big Data and APIs for PHP Developers” workshop followed by ”A Billion Columns? No problem: an Introduction to the Cassandra Database” and “Big Data for Everyone (No Data Scientists Required).”

SXSWI Attendees Embrace NoSQL

Two sessions focused on the popular column-based NoSQL data store Cassandra, which was created by Facebook, and later used by high-volume sites such as Twitter. Although neither of the sessions was intended to create a league of new Cassandra experts, the “Embracing NoSQL: Your First Cassandra Project” was conducted as a hands-on workshop with code examples and lessons learned from the panelists’ projects. In addition to the code, panelists provided a high-level overview of when and why you might use NoSQL, an overview of NoSQL terminology and a comparison of Cassandra to other NoSQL tools such as CouchDB and MongoDB.

Cassandra wasn’t the only NoSQL option at SXSW. Lucid Imagination (news, site), commercial providers of Solr/Lucene products and services, partnered with Bazaarvoice (site), discussed a unique notion of a NoSQL data store, and it’s not a new NoSQL product. At “Solr Power FTW: Make NoSQL Your Bitch! ," panelists presented a design strategy that leveraged popular open-source search platform Lucene/Solr to query a dataset instead of traditional SQL -- essentially making Solr a NoSQL application over your data. A bit of a stretch? Lucene/Solr:

  • doesn’t use SQL or a relational paradigm
  • supports flexible schemas
  • works with denormalized data

which looks similar to the basic profile of a NoSQL repository but made extra tasty with full text search, faceting, spell checking and similar item search. The slides from the presentation are available on SlideRocket.  Well played, Lucid Imagination, well played.

Even More Big Data

NoSQL repositories are only one option for managing large information stores. "Data for Everyone (No Data Scientists Required)" featured an impressive lineup of solution architects from Twitter, HP, DataStax, Infochimps and Cloudera discussing options for tackling big data management. According to the panelist, massive amounts of data are being created in almost every organization and computing storage is a low-cost commodity, making it easy for enterprises to amass huge volumes of data that traditional data stores are ill-equipped to process.

Steve Watt of HP provided an overview of Hadoop, an implementation of the Map/Reduce framework
that allows huge datasets to be processed in parallel across commodity hardware. Panelists depicted a big data ecosystem


a big data ecosystem

that could be leveraged to address challenging big data use cases, such as managing email -- or, in addition, more modern techniques such as advanced data visualization as data-as-a-service that organizations can include in their data strategy.

Big data was also the focus of "Big Data and APIs for PHP Developers." The session was a deep-dive into the concept of big data. Although PHP was in the title, the session was not about code and provided an thorough and insightful introduction to the concepts, tools and techniques of big data. The 165-slide presentation is available online.