Spurred on by an online debate about the distinction between text analytics and semantic content enrichment, I turn in this article to the pressing question: "What does semantic content enrichment mean?"
As IBM's Marie Wallace remarked, it’s great to see the term semantic content enrichment generating discussion although, she continued, "I suspect that most people still don’t differentiate it from just text analytics."
The Distinction
Oh, but there is a difference. Let’s explore it via the definitions that follow, first of text analytics, then content analytics and finally content enrichment and where the ensemble takes us.
First definition:
Text analytics is a set of software and transformational steps that discover business value in “unstructured” text. (Analytics in general is a process, not just algorithms and software.) The aim is to improve automated text processing, whether for search, classification, data and opinion extraction, business intelligence or other purposes.
To expand on this definition a bit, to bridge from text to the wider content world:
Text analytics draws on data mining and visualization and also on natural-language processing (NLP). Supplement NLP with technologies that recognize patterns and extract information from images, audio, video and composites and you have content analytics.
The concept of content enrichment is easy to grasp: Every link in this article — Web links are accomplished via the HTML “a” anchor tag — is a bit of content enrichment. And semantic content enrichment? Marie Wallace puts it this way, focusing on text but with concepts that extend to the broad set of content types:
When I think about semantic enrichment, I see it as transforming a piece of content into a linked data source. In order to do this you do indeed need text analytics for entity and relationship extraction, but you need more than that…. A text analytics engine might recognize that [Marie Wallace] is a person, [Ireland] is a place, and Marie comes from Ireland and annotate the entities/relationships found. However when doing semantic enrichment, I would want to convert those annotations to openly addressable URIs that contribute to the linked data cloud.
URIs are uniform resource identifiers, Semantic Web terminology for IDs, unique within a namespace, that name or locate things. Web URLs (e.g., http://whitehouse.gov/) are a type of URI.
Rather than write my own annotation elaboration, I’ll reuse one from Ontotext, a semantic-technology developer:
Annotation, or tagging, is about attaching names, attributes, comments, descriptions, etc. to a document or to a selected part in a text. It provides additional information (metadata) about an existing piece of data.
Semantic Annotation goes one level deeper:
- It enriches the unstructured or semi-structured data with a context that is further linked to the structured knowledge of a domain.
- It allows results that are not explicitly related to the original search.
The earliest specific semantic content enrichment reference I’ve encountered is in an Ontotext paper, Towards Semantic Web Information Extraction, presented at the 2003 International Semantic Web Conference (ISWC).
Continue reading this article:

Full RSS Feed
Receive
the Free CMSWire Newsletter
Email It