The BBC’s website for the 2010 World Cup was notable for the raw amount of rich information that it contained. Every player on every team in every group had their own web page, and the ease with which you could navigate from one piece of content to the next was remarkable. Within the Semantic Web community, the website was notable for one more reason: it was made possible by the BBC’s embrace of Semantic Web technologies.
In the first two articles in this series on the Semantic Web, we first looked at an overview of the Semantic Web and then looked at Semantic Web technologies in detail. This time around, I’m pleased to share the story of the BBC’s adoption of Semantic Web technologies to help power BBC Online, as related to me by Yves Raimond (BBC Senior R&D Engineer), Michael Smethurst (BBC Senior Content Producer) and Olivier Thereaux (BBC Senior Technologist).
Lee Feigenbaum (LF): When did the BBC first start using Semantic Web technologies? When did the technologies first move to production?
Yves Raimond, Michael Smethurst and Olivier Thereaux (BBC): It's difficult to pinpoint an exact moment when the BBC first started to use Semantic Web technologies. It was more something we have evolved toward from a shared approach and shared philosophy. We have been thinking in Linked Data terms for seven or eight years without necessarily using specific technologies. A rough chronology would be:
- 2004: Around 2004, work started on PIPs (programme information pages), which aimed to create a Web page for every radio programme broadcast by the BBC. This began our approach of using one page (one URL) per thing and one thing per page (URL).
- 2005: Tom Coates published "The Age of Point-at-Things," a blog post filling out some of the thinking behind giving things identifiers and making those identifiers HTTP URIs. Also in 2005, BBC Backstage was launched as an attempt to open BBC data and build a developer community around that data.
- 2006: Work began on /programmes, a replacement for PIPs covering both radio and TV. Around the same time we bought -- in bulk -- copies of Eric Evan's "Domain Driven Design" which influenced the way we designed and built websites to expose more of the domain model to users. Building on Backstage, we added data views to /programmes (JSON, XML, YAML, etc.).
- 2007: In 2007, we started work on rebuilding /music as a way to add music context to our news and programmes. Because we didn't have our own source of music metadata we looked for people to partner with and settled on MusicBrainz because of their liberal data licencing. Previously we had silo’ed micro-sites for programmes and music. By stitching MusicBrainz artist identifiers into our playout systems we linked up these silos and allowed journeys between /programmes and /music. At the same time as we started to consume open data, we also started to publish Linked Open Data, creating the Programmes Ontology and adding RDF to both /music and /programmes. At the time, we found it much easier to develop separate but related applications in a loosely coupled fashion by dogfooding our own data: /programmes uses data views from /music and vice versa.
- 2008: We rebuilt more of bbc.co.uk (/nature and /food) according to domain-driven design and Linked Data principles, publishing a Wildlife Ontology and RDF for /nature. Again we borrowed open data to build a framework of context around our content: this was the start of us using the web as our CMS and the web community as our editors.
Up to this point we'd published ontologies and RDF and also consumed RDF, but we were still using relational databases (rather than triple stores) to serve websites.
- 2010: Published the World Cup website using a BigOWLIM triple store [LF: a triple store is a database that stores RDF data]. News articles were tagged with entities in the triple store and inference used to propagate those tags to all relevant entities through the graph.
- 2011: Rolled out the World Cup approach across the whole of BBC Sport.
- 2012: Rolled out the Olympics site using the same model as BBC Sport.
LF: Could you describe the main use cases of Semantic Web technologies at the BBC? Would you characterize these use cases as “dynamic content publishing”?
BBC: Our use of Linked Data breaks down into three areas:
[LF: the term Linked Data refers to a specific set of best practices for working with Semantic Web (RDF) data; the term Linked Open Data refers to Linked Data that is freely available on the Web.]
- Publishing Linked Data: to make our content more findable (e.g. by search engines) and more linkable (e.g. via social media or by other Linked Data publishers using the same vocabularies and identifiers);
- Consuming Linked Data: to “borrow” additional context for our content where we don’t have existing data and want to cut content by specific domains (music, nature, food, sport). The Linked Open Data that we use also helps give us additional links between domains.
- Managing data internally as Linked Data: to maximize the use we get out of editorial input by propagating editorially added links across data graphs; to make more links between otherwise siloed sites.