WCM Field Notes is a regular column written in collaboration with Jon Marks (@McBoof), Head of Development at LBi. This second issue looks at what Open Source really means, and suggests ways for you to sensibly include both open source and proprietary systems in your Content Management System selection exercise. 

There seems to be a lot of fear, uncertainty and doubt surrounding open source content management these days. Last week, I was fortunate enough to be asked to speak at a British Computer Society Open Source event but was rather surprised by the lack of agreement about Open Source Software (OSS).

Many attendees thought that Open Source and Open Standards are one and the same. During the panel debate, one of the delegates even said "I would always select Open Source because it allows me to develop using an agile methodology," ... which really got me growling.

So I'd like to use this post to clarify 3 somewhat interrelated concepts -- open standards, open source and open data. Once we've done that, I'll offer my thoughts on how to go about including OSS and proprietary options in your CMS Selection RFP on a nice, happy and level playing field.

[Editor's Note: Don't miss the previous edition of WCM Field Notes: The Skinny on JCR, CMIS and OSGi.]

What Does Open Really Mean? 

Concept: Open Standards 

Open standards are what make the Internet possible. The railway gauge (the width between the tracks of a railway) is the classic example used to explain the concept. Back in the day, train tracks were of different widths so people and cargo actually needed to change trains because the one they were on didn't fit on the next track.

Once the width was standardized, life became a whole lot easier for everyone. Open standards are the railway tracks of the web. I was planning to stretch this analogy further, with the software being the trains and the cargo being the data but my wise colleagues advised me not to. The analogy, er, quickly fell off the rails.

For a standard to be truly open, it should have been created in a transparent way and should be available for anyone to use. Some people believe that a standard cannot be truly open unless it comes with an open source reference implementation.

I don't buy this -- the standard is nothing to do with the source code, although having a reference implementation certainly helps. There are several bodies that work extremely hard to develop and foster open standards. The ones that most affect my world are the W3C, IETF, OASIS and the JCP Program.

We have many useful open standards. We have low level standards that are the plumbing of the Internet, such as TCP/IP, DNS and HTTP. We have standards that allow us to make web pages that work on multiple browsers and devices (XHTML, CSS) and perform clever interactions ( XMLHttpRequest to support AJAX). We have accessibility standards (WCAG, WAI-ARIA) that ensure all users can access the pages. And we have semantic and classification standards (RDF, Dublin Core) that ensure the machines can understand and use the content too.

Higher up the chain, standards get more domain specific. There are many data format standards. If you judge a standard by its adoption (which is the best way), then XML was the most successful standard of the last decade

There are standards for content syndication, for authenticating users across many applications, for the creation of "widgets", for portability across social networks and almost anything else you can think of.

SQL was a wildly successful standard that allowed us to store and access content. The Java Content Repository (JCR) standard is a well known content management specific standard, and Content Management Interoperability Services (CMIS) is a newly emerging one. I talked about the JCR and CMIS in the previous WCM Field Notes column. And I've created An Incomplete Directory of Open Standards for those that want a more complete list.

Image Credit: Rob Cottingham

Concept: Open Source

True open source software is software that is licensed under specific open terms (free and redistributable) and developed using a particular open process, part of which includes full access to the source code, for anyone. That's really about it. A good definition of OSS can be found on the Open Source Initiative website.

My friend Justin Cormack came for a beer and a chat after the BCS event. He has written an excellent blog post helping to distill the essence of OSS, and its impact on content management. He says:

open source...started with developers, about more efficient ways of building, architecting and delivering software; in terms of influence on the end users it is still small.

This is very important. The fact that a product is open source should not matter much to anyone except the development teams. And maybe those signing the checks and the lawyers, but we'll talk about this later.

Justin also recommends reading the Cathedral and the Bazaar essay, written over ten years ago by Eric S. Raymond. This study analyzes how one successful open source project worked and explores the argument that "Given enough eyeballs, all bugs are shallow."

The essay wonderfully captures the spirit and power of OSS as a vehicle for software development. The Cathedral refers to organized, closed development and the Bazaar to the mayhem of true, open development. I've always wondered if Raymond picked a cathedral to imply some link to the religious debate between Grand Design versus Evolution -- the proprietary versus OSS debate can get pretty religious at times.

Eric S. Raymond Describing The Cathedral vs. The Bazaar

Good OSS will heartily embrace open standards. But so will good proprietary software. Be warned, though -- there is bad software of all types out there that ignores standards.

Image Credit: Rob Cottingham

Concept: Open Data

The open data philosophy believes that certain data should be free and available to anyone. There are justifications for this -- it could be because the research to produce the data was paid for by taxpayers, or because of the belief that you cannot put a copyright on facts. Or because openness is simply better. Some people -- like Hans Rosling of GapMinder.org -- even think that all educational materials paid for by public funds should be made open and accessible to all.

Data is not considered open if there are licenses preventing re-use of the data, if only certain individuals (for example, registered members on a web site) can get at it, or if the storage format makes it difficult to access.

So we have overlap with open source (licensing and copyright models) and open standards (storage and interchange formats), but it is a distinct concept.

Tim Berners-Lee, the man credited with founding the Web, is one of the loudest voices in favor of open data. In the below video he attempts to explain the concepts of open data and linked data to a non-technical audience -- and the choice of language is at times rather amusing. His flower analogy is, however, a powerful illustration.

Tim Berners-Lee: The Next Web of Open, Linked Data

The most visible open data project in the last 10 years has been the Human Genome project. You can get free access to this data (all 150GB of it) now from various sources. In fact, it is one of the popular Public Data Sets hosted on the Amazon EC2 Web Services platform.

The list of categories of datasets on the platform gives a good insight into the kind of data that has already been made open: Astronomy, Biology, Chemistry, Climate, Economics, Encyclopedic, Geographic and Mathematics.

A good example of an open data project is OpenStreetMap -- a "a free editable map of the whole world". And the UK Government is planning to open up the UK Postcode data in 2010. Currently you have to pay to use this data. As more and more data becomes open, we'll see more clever and useful applications of it.

Image Credit: Rob Cottingham

Evaluate Open Source Fairly

Now that we know what OSS really is, we need a way to decide if it is the right choice for us. In my daily work I see a lot of CMS shortlists. However, virtually all of these short lists are either entirely composed of proprietary systems, or entirely of open source systems. The "to open source or not open source" decision seems to have been made much earlier, sometimes subconciously, and almost always for the wrong reasons.