Big data and the tools to manage it have become an almost constant topic of discussion, and NoSQL, a general category of non-relational data storage tools designed to store large volumes of loosely structured data, is frequently at the center of these conversations. However, not everyone is enamored with NoSQL. A recent report by Information Week, “Why NoSQL Equals No Security,” suggests that security is barely an afterthought in NoSQL repositories. Is this a valid assessment or just a bit more hyperbole in a market that seems to gravitate toward tabloid style sensationalism?

Big Data, Big Security Problem

It’s no secret that big data technologies such as NoSQL repositories and Hadoop-based processing platforms are growing explosively. The technologies, which proved their ability to handle Internet-scale data volumes and velocities at a lower cost than traditional platforms at companies such as Yahoo, Google and Amazon, have a massive appeal to technology leaders. Everyone from small, innovative startups to giants such as Oracle, Microsoft and IBM now offer products targeted at managing the large volumes of data organizations are rapidly collecting and seeking to make actionable. Although these technologies are experiencing widespread popularity and adoption, the market is still in an emerging stage -- there are lots of feature gaps and opportunities for improvement. Security is clearly one of those areas.

Security is the focus of Information Week’s recent report, “Why NoSQL Equals No Security.” The report doesn’t just have a contentious title; the content manages to keep pace with the report's name. For example, the first paragraph proclaims,

Nowhere in the mission statement is protection mentioned, and the fact is, most big data technologies don’t have any security features built in. And guess what? Few ever will.”

While the report concedes that many big data and NoSQL technologies support some form of authentication, it continues with its condemnation of state of security in the market.

The big data show is being run by developers, not architects or even system administrators. These developers clearly don’t realize that 14% of all 2011 breaches were caused by compromised database servers.”

The industry assessment in the report suffers from a common weakness with analysis of the big data market. It broadly groups solutions and scarcely acknowledges that distinct categories with varying levels of maturity exist in the market. Hadoop is not a NoSQL repository. NoSQL tools differ tremendously in implementation and purpose. This failure to examine the market as a compilation of unique parts results in the entire market being portrayed as an impending security disaster. Omer Trajman, Vice President Technology Solutions, Cloudera, evaluated the report, saying,

It [the report] was very convoluted. It treated NoSql and big data processing tools like Hadoop as the same thing. Both are new ways to manage data, but they are at different stages in security requirements and maturity. Hadoop is much further along than portrayed.”

Other industry leader had similar perspectives, especially with regard to Hadoop, which has both authentication and authorization features in its various components. Eric Baldeschwieler, CTO, Hortonworks, told us that big data infrastructure is not inherently insecure. Baldeschwieler went on to say,

With care and use of best practices, Apache Hadoop can be used securely.  Hadoop can be deployed using Kerberos for user and service authentication and it has a posix-like user & group authorization model for HDFS.  Combined with access logging and other features, this has allowed Hadoop HDFS deployments to pass SOX compliance data audits.  As the Big Data ecosystem matures I expect to see more complete security features added to all of the major big data storage players."

David Gorbet, Vice President Product Strategy, MarkLogic, had a similar view. Gorbet explained,

It’s an overgeneralization to say that NoSQL Databases are inherently insecure. MarkLogic, for example, is a non-relational database that meets the highest grade of government security standards and is used in mission-critical Big Data Applications across the commercial and public sectors. The best advice we can give is for organizations to do their due diligence when looking at security and NoSQL solutions. Organizations should map out specific questions regarding security, high-availability, performance, and transactional guarantees to make sure the potential solution is a good fit.”

This is clearly a much different picture than that painted by the report. Security in big data products is not perfect, but it is far from nonexistent.

Secure Big Data Without the Vitriol

Data security is important. Any company that stores sensitive data -- financial details, customer records or product specifications -- can become a target for attackers. Big data solutions should be held to the same security standards as other technology solutions in the enterprise. 

If you can get past the frequent jabs at developers, the report has a valid underlying message -- organizations should take a more holistic approach to selecting big data tools. The market is in such an embryonic state that features still vary wildly by vendor and product. No one should assume the existence of any capability. Potential customers must look beyond data storage and processing capabilities of big data tools and also evaluate each platform’s non-functional requirements such as security and supportability.