Recent news about surveillance of communications metadata has propelled this normally arcane part of the information world to the top of the stack.
If you’re interested in learning more about metadata, there is no lack of resources: a Google search yields more than 100 million results and even Wikipedia uses 14 pages and 35 references just to explain metadata.
But if you’re an information creator, manager or publisher, all of this admittedly valuable information can seem like an impenetrable maze, especially if you’re facing the need to design your information resources and technology: lots about metadata but not so much about how to design and manage it for your particular need. The currently favored description of metadata as “data about data” is nominally correct but not very illuminating.
Metadata for Content Creators
Let me suggest that metadata may be viewed in two major functional categories, each one important in the design and implementation of information automation efforts:
- Metadata as data about the Physical structure and processing of information files: ignoring for the most part the intellectual content of the files. Let’s call this “structural metadata.”
- Metadata as data about the intellectual content of information files with particular emphasis on ways to locate the content. Let’s call this category “finding aid metadata.”
This category is the most analyzed and described of the two metadata categories, perhaps because it is the most tightly integrated with the technology used to create and process files. In truth, virtually everyone creates and uses this type of metadata. Word processors, for example, create and keep data about the dates, authors, revisers and accesses of every document they generate. MS Office even uses XML as its underlying data markup, for content as well as metadata and other properties. While this won’t help much in making your content usable, it makes the metadata in MS Office files easily accessible via standard XML software tools.
What’s important about structural metadata is that because it is mostly computer to computer communication, users have less control over it when designing or planning for content creation. Perhaps the most important aspect to structural metadata for the content creator and manager is in selection of software that does a good job of creating it, both in open formats like XML and with a robust set of values, making the resulting files more transparent and valuable to downstream software and users.
Finding Aid Metadata
For content creators, managers and publishers, this is the meat of metadata. It has been said that a key component of information is in finding it. Melvil Dewey understood this when in the 1870s he devised a numerical method of classifying books and monographs by subject, a design sufficiently robust that we still use it -- the Dewey Decimal System (DDS).
While technology has changed the ground rules, the concept of finding aids is no less important than it was then, perhaps becoming even more so as the sea of content in which we live has grown exponentially. Finding a needle in the haystack, after all, becomes increasingly more difficult as the haystack grows.
Historically, library finding aids depended on the creation of external cataloging and catalog cards or printed lists to lead users to the desired content. Books could be shelved in gross subject matter sections, but anything beyond that required a visit to the card or book catalog to find the proper Dewey subject classification and books that fell within it.
With the dawn of the Internet and electronic display of content, we could begin to merge the finding aids with the content itself: first by sequentially searching entire files, then by developing search engines capable of building and rapidly searching inverted word lists for entries that matched the user’s query.
As the sheer size of the Internet grew, however, search engines needed more detailed and specific information to perform, sometimes including terminology not present in the actual content. To deal with this, concepts like keywords in files, external search terminology, usage pattern analysis and other techniques came into use.
At the same time, users, dealing with massive search hit lists, began agitating for more direct ways of finding what they wanted. In short, they wanted to navigate to their desired information rather than searching, however efficiently, against entire collections of content. Concepts like “faceted” cataloging and search came into use as a means to organize content to allow users to navigate down defined paths or “facets” to the content they are seeking.
In all of this, metadata grew in importance.