The BBC’s website for the 2010 World Cup was notable for the raw amount of rich information that it contained. Every player on every team in every group had their own web page, and the ease with which you could navigate from one piece of content to the next was remarkable. Within the Semantic Web community, the website was notable for one more reason: it was made possible by the BBC’s embrace of Semantic Web technologies.
In the first two articles in this series on the Semantic Web, we first looked at an overview of the Semantic Web and then looked at Semantic Web technologies in detail. This time around, I’m pleased to share the story of the BBC’s adoption of Semantic Web technologies to help power BBC Online, as related to me by Yves Raimond (BBC Senior R&D Engineer), Michael Smethurst (BBC Senior Content Producer) and Olivier Thereaux (BBC Senior Technologist).
Lee Feigenbaum (LF): When did the BBC first start using Semantic Web technologies? When did the technologies first move to production?
Yves Raimond, Michael Smethurst and Olivier Thereaux (BBC): It's difficult to pinpoint an exact moment when the BBC first started to use Semantic Web technologies. It was more something we have evolved toward from a shared approach and shared philosophy. We have been thinking in Linked Data terms for seven or eight years without necessarily using specific technologies. A rough chronology would be:
- 2004: Around 2004, work started on PIPs (programme information pages), which aimed to create a Web page for every radio programme broadcast by the BBC. This began our approach of using one page (one URL) per thing and one thing per page (URL).
- 2005: Tom Coates published "The Age of Point-at-Things," a blog post filling out some of the thinking behind giving things identifiers and making those identifiers HTTP URIs. Also in 2005, BBC Backstage was launched as an attempt to open BBC data and build a developer community around that data.
- 2006: Work began on /programmes, a replacement for PIPs covering both radio and TV. Around the same time we bought — in bulk — copies of Eric Evan's "Domain Driven Design" which influenced the way we designed and built websites to expose more of the domain model to users. Building on Backstage, we added data views to /programmes (JSON, XML, YAML, etc.).
- 2007: In 2007, we started work on rebuilding /music as a way to add music context to our news and programmes. Because we didn't have our own source of music metadata we looked for people to partner with and settled on MusicBrainz because of their liberal data licencing. Previously we had silo’ed micro-sites for programmes and music. By stitching MusicBrainz artist identifiers into our playout systems we linked up these silos and allowed journeys between /programmes and /music. At the same time as we started to consume open data, we also started to publish Linked Open Data, creating the Programmes Ontology and adding RDF to both /music and /programmes. At the time, we found it much easier to develop separate but related applications in a loosely coupled fashion by dogfooding our own data: /programmes uses data views from /music and vice versa.
- 2008: We rebuilt more of bbc.co.uk (/nature and /food) according to domain-driven design and Linked Data principles, publishing a Wildlife Ontology and RDF for /nature. Again we borrowed open data to build a framework of context around our content: this was the start of us using the web as our CMS and the web community as our editors.
Up to this point we'd published ontologies and RDF and also consumed RDF, but we were still using relational databases (rather than triple stores) to serve websites.