One of the most challenging aspects of any content management initiative is getting a view into all your content. No wonder. If you could see it all clearly, you wouldn’t need content management, right?The content analysis phase of a project can be multi-faceted. You can analyze content on the surface level from just one or two angles. Or, you can look at it from several angles and peer deep into its makeup. Which facets you analyze and how deep you go depends on your goals -- weighed against your budget, resources, and timeline constraints. Before beginning to analyze your content, determine your goals: # Discoverability. To make it easier for people to find content, focus on developing a solid taxonomy. Include both a logical hierarchy and a tagging scheme with an emphasis on search engine optimization (SEO). # Reusability. To reuse content across publications or formats, the analysis task is more of a process of “componentization” -- determining what the reusable chunks are, naming those chunks, and tagging them appropriately. Analysis for reuse can require significant edits or rewrites because, oftentimes, content doesn’t chunk the way you would like. # Personalization. To deliver personalized content dynamically by matching content components to audience profiles, you definitely need to chunk it and tag it—specifically tagging by audience. By the way, make sure you have a solid audience and a good method to collect the audience profile data so you can match them to content. # Improved Quality. To rationalize content so that it’s more consistent and does a better job of meeting audience needs, you have yet another facet. The process may involve editing, purging, or rewriting poorly-written or obsolete content. This kind of qualitative analysis can be the most difficult; it often requires subject matter experts rather than just “a few temps” to review and tag the content. Regardless of which goal is your priority, one of the most basic steps is to do a “content inventory” (also known as a content audit). Content inventories are most often created in spreadsheet form and include fields for data like Title, File Name, File Type, Publish Date, Subject, Keywords, Owner or Author (if known), URL, and Comments. Some inventory tasks can be automated. For example, exporting a file list from a server that includes data like name, file type, and date is relatively easy. Commercial software tools also exist that can help automate tagging by searching for recurring words and patterns within text. Tedious as it may be, with some of the above goals, the only sure way to get there is by having a human look at each piece of content (file, Web page, document) and record information about it -- you know, the metadata. Before launching headlong into this type of inventory, however, it is key to take time to clearly identify what metadata you want to capture. One goal on a recent project was to map content to audience tasks. Much effort had gone into identifying audiences and tasks. Similarly, someone had diligently reviewed and inventoried hundreds of content items. Unfortunately, that person didn’t know about a requirement to match content based on the audience’s company size. Because of this omission, the work essentially doubled, requiring another full pass through all of the content to tag it with this critical metadata. Don’t let this happen to you. As with all projects, the most important step is to know your goals. In the case of a content inventory, for each metadata tag you forget, you could potentially multiply your workload. How to avoid it? Look at your content from multiple facets, like the ones suggested above. Take the time to peer beyond the surface into the prism of your body of content and decide if you need (and can afford) to map out the full spectrum of metadata. Maybe, in the first round, you prioritize violet, blue, and red. However, be realistic about your resources and budget. Know that taking on the full spectrum of content analysis across all of your content may be a dream that’s somewhere over the rainbow! Editor's Note: See our article entitled What is an Ontology? And Why We Need Them for additional context around building content inventories, performing analyses and achieving a common system vocabulary.

About Rita Warren

Rita Warren of ZiaContent, Inc. has been working in the fields of computer software and new media for more than a decade, bringing a broad range of experience in information architecture, content development, and content management systems. ZiaContent helps clients by designing and creating sensible content and working with them to implement sensible content management solutions. Rita is also a frequent contributor to