It’s not really accurate to call our use cases “dynamic content publishing.” Our actual content (TV and radio programmes and news articles) is still fairly static. The Linked Data / Domain Driven Design approach is less about dynamic content and more about dynamic context and dynamic aggregations around that content that let us maximize exposure to our content by placing it in different contexts (wildlife, music, food, football, etc.).
- Because bbc.co.uk has content in so many domains, it’s like a microcosm of the web. One of our goals with this work is to move from a set of silo’ed sites to a coherent service which we can only do if our content is well described and interlinked. Finally, by using domain-native URL keys we can generate more inbound link density and make our content more findable on search engines.
LF: How did the BBC produce these sites before the Semantic Web approach?
BBC: By hand. There was lots of hand-rolling of microsites around specific items. There was lots of aggregations maintained by editorial hands. The Semantic Web approach meant that we could provide many more aggregations and many more routes to content at lower cost.
LF: Have you been able to measure any results from these efforts?
BBC: For the Olympics, the Dynamic Semantic Publishing (DSP) architecture allowed us to offer a single page for every country (200+), every athlete (10000+), every discipline/event (400-500) and every venue. All of these pages were complete with aggregated relevant stats and news.
[LF: A blog entry about the 2010 World Cup site indicates that it has over 700 pages for teams, groups and players. This was an amount of content that never would have even been considered without the automated Semantic Web approach. The same blog entry puts this into perspective: “The World Cup site had more index pages than the rest of the [hand-edited] BBC Sport site in its entirety.”]
LF: Were there particular things that you learned from the World Cup and were able to change for the Olympics?
BBC: The World Cup site worked. Everything that we learned at a relatively small scale from the World Cup site could be applied to the Olympics, which was an order of magnitude more complex.
In the context of adapting our architecture to the complexity of all of the different Olympic events and disciplines, we made one significant change: We added a MarkLogic XML database. In the words of Senior Technical Architect David Rogers:
Fundamental to this approach was the use of MarkLogic to store and retrieve data. MarkLogic is an XML database which uses XQuery to store and retrieve data. Given the timescales, this project would not have been achievable using a SQL database, which would have pushed the design towards more complete modelling of the data. Using MarkLogic, we could write a complete XML document, and retrieve that document either by reference to its location, a URI, or using XQuery to define criteria about its contents."
LF: Are there other uses of Semantic Web technology not related to content publishing that are being explored within BBC?
BBC: We are currently exploring various other uses of Semantic Web technologies within BBC R&D. In particular we’re looking at ways in which Linked Data can be used to help search and discovery of archive content. We have been working on automatically identifying the topics and the contributors for BBC programmes from their content, using a combination of Linked Data, signal processing, speech-to-text and Named Entity Recognition technologies, which we have been talking about in various places, such as the Linked Data on the Web workshop and at WWW’2012. The automatically generated links from programmes to entities described in the Linked Data cloud might be incorrect in places, so we are also exploring how users can validate or correct those links, and how this feedback can be taken into account within our automated interlinking workflow. We are planning to write in more details about our experiments in that space on the BBC R&D blog in the next couple of weeks.
LF: What are your plans going forwards?
BBC: We are currently annotating quite a lot of our content with Linked Data URIs to drive a number of aggregations on our site, but we are making little use of the connections between all these URIs. So far, we have only been using those in our automated tagging tools, to disambiguate between candidate identifiers. There is a big opportunity in using those connections for storytelling purposes -- using paths in that graph of data to help tell stories around our content. It becomes even more of an opportunity if we start describing the content of individual programmes in more details, such as describing the narrative structure of dramas, for example. We started some investigation in that area in our Mythology Engine project, but there is much more that could be done.
I think there are several lessons to learn from the BBC’s experience with Semantic Web technologies:
- Embracing these technologies was an evolutionary process; it started with a general philosophy, rolled out incrementally, and ended up providing a significant strategic advantage.
- The BBC invested a great deal of energy in being able to clearly articulate the vision and the value of the Semantic Web approach on their various blogs, and in doing so sought to engage a much larger community beyond the BBC.
- Semantic Web technologies are not an end in themselves. While they play a crucial role in what the BBC has accomplished with dynamic site publishing, there are many other technologies (such as XML, Silverlight and standard HTTP) that need to come together for this application.
My thanks to Yves, Michael, and Olivier for taking the time to contribute their experiences for us all.
Editor's Note: To read the beginning of Lee's series on the semantic web, The Semantic Web and the Modern Enterprise.