What is an Ontology? And Why We Need Them

6 minute read
Brice Dunwoodie avatar
Here at the 10th International Protégé Conference in Budapest, Hungary the word of the day is 'ontology'.Most folks inside these walls know exactly what that means. Most people outside these walls do not. And those of us in the content management space should have a clear idea. Certainly, as the CM and Web technologies around us evolve, we will be left behind if we don't.Starting at the beginning, an ontology is really such a simple concept that I find it a shame that there are so many definitions which seem to make the idea less rather than more clear, and more rather than less complex sounding. What confuses things further is the fact that the word has its roots in philosophy and that the meaning in the philosophical context tends to differ a bit from what most scientists and industry practitioners mean. In the philosophical domain, the word ontology tends to mean a branch of metaphysics concerned with the study of existence -- the work of trying to understand the basic structure of our world. This definition sheds some light on the what is meant outside of philosophy, but by no means takes us (or at least me) all the way home. In the scientific and commercial realms, first an ontology tends to mean a thing that is produced rather than a field of study, and then we get a host of rather inaccessible explanations about what that thing is. Here's one from Tom Gruber for starters: "An ontology is a specification of a conceptualization." Or how 'bout this: "Ontologies are 'specifications of a relational vocabulary'." Here's what Wikipedia says: "An ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts." Got it? Right then, can I see yours?Oh, what, sorry, what's that? Oh, you said "What's a 'conceptualization'? What is a 'domain'? What's a 'relational vocabulary'?" Right well, I thought we all talked like that, but for those who don't, a 'conceptualization' is like a bunch of concepts pertaining to some slice of reality. A domain is a particular slice of reality -- for example your company's Web publishing operations or Joe's pizza restaurant down the block or the human genome. And then that 'specification' word, well we're all at least somewhat familiar with that, but in this context it means that you're defining and documenting some truth about the slice of reality in question. And I don't have any idea what a 'relational vocabulary' is. But generally, have you got the concept? Kind of? Hmm, OK, let's try this:An ontology is a detailed model/picture/schema (pick your favorite word) of a slice of reality which is based on the facts that we know about that reality. This model/picture/schema is a description of some of the things and some of the relationships between the things that are known about that reality.The W3C describes ontology in accessible language: "An ontology defines the terms used to describe and represent an area of knowledge." That's pretty concise.Barry Smith, who spoke at the conference yesterday, used the example of a map's legend. His statement was that "ontologies do for data what legends do for maps." Barry comes from the bio-informatics area and tends to think in terms of things like the Gene Ontology (GO) project. But the GO project is not content management land. Content Management professionals deal with systems that are decidedly less complex and which tend to have fewer collaborating parties. Nevertheless, it's my opinion that in content management land ontologies can be very useful as well. Organizations like the BBC have demonstrated this rather definitively. Their work with Protégé 3.x tool and Web Ontology Language (OWL 1.0) was extensive and played a fundamental role in both their content modeling exercises and in their content management implementation, not to mention their on-going change management operations.But what the BBC did was beyond what we as CM professionals might aspire to on average -- they were able to generate 10,000+ lines of XML configuration instructions for their CMS from custom Protégé plugins. In the field of content management the exercise of building a basic ontology for an operational domain is in itself a valuable and worthy component of a project. Even if we arrive at zero functional outputs (e.g., no generated code, tables or configurations) the exercise of modeling the domain and formalizing what your organization or your client's organization knows about their operations can go a long way towards paving a project road for success.And if we can do more with the ontology that is great. There is some low hanging fruit, to be sure. Examples of this include visualization of the ontology graph, documentation of the ontology types(classes) and their properties. And if we nudge and push the Manchester Protégé team a bit, during an interview today Nick Drummond of CO-ODE conceded that it would not be that hard to generate controlled vocabulary documentation from the Protégé tool. These things are useful and and inspirational.Just today Theresa Regli of CMS Watch wrote of the need for "consistent standards and consensus around vocabulary." Since I've got the ontology hammer in hand, I can't help but think: Yes, that's exactly right and I might just know how to nail it.Protégé is an ontology modeling tool developed in collaboration by Stanford University and the University of Manchester. The tool is written in Java and installs quickly on Windows, Mac and (I believe) Linux machines. Version 3.3 was released on July 6th of this year. Version 4.0 -- which represents a fairly dramatic architecture shift -- is in alpha, but seems stable and works fine for most basic operations. Those planning to dig deeply into the tool and/or develop plugins would be advised to speak to the Protégé team before choosing a version to invest in. There are trade offs related to the plugin architecture and the compatibility with RDF. Learn more at the Protégé website and the CO-ODE website.

About the author

Brice Dunwoodie

Brice is the founder of Simpler Media Group, Inc., the organization behind the CMSWire and Reworked publications, and the creator of the DX Summit and Digital Workplace Experience conferences.