Stop Pulling Teeth A Better Way to Classify Documents

Stop Pulling Teeth: A Better Way to Classify Documents

3 minute read
Joel Oleson avatar

People hate to upload a document only to find that there's a half dozen fields to fill in before they can finish the upload process.This approach may seem easy, but often times it creates a love/hate relationship with SharePoint -- users see it as a limitation of SharePoint and will turn to a file share or Dropbox instead.

Alternately, they take the easy way out and fill the fields with the first entry in the drop-down, resulting in metadata that is populated, but wrong.Required manual tagging becomes yet another thing users have to do to get their work done.

Find the Happy Medium 

I’m not suggesting that users are totally off the hook. There’s a happy medium that ensures content is accurately tagged and classified.Why not build the key information into the template so that the automation can be precise? The process can pick up the key pieces of information from the document such as department, division, city, region, area, district, document number, pricing, etc. and use it in an automated content extraction process to create metadata by machine.

Encourage users to work with documents the way they normally do and use a third party tool such as an auto classification tool to extract text based content, products, subjects and terms out of the document. This will create good, standardized metadata to use for search refinement.It can even be used to flag sensitive information or report content detected with code names, personally identifiable information such as credit card numbers, social security numbers or phone numbers.

Users will then see the connection between what they add and what gets included in metadata and used in refinement for search.Seeing this connection is very important.Without it, users will think even checking the metadata is a useless chore.

Learning Opportunities

A Win-Win Situation

An automated process based on a little bit of core information creates a win win situation: users get their work done faster and better and don’t have the pain of manual tagging.No more pulling teeth.

You can extend this to content that lives outside of SharePoint as well.SharePoint has a built in content enrichment API built for just that purpose. By using third party tools connected to that API, crawling the content and extracting valuable metadata, you can either add it as enterprise keywords, or to take it to the next level and set the content type and other attributes automatically. This reinforces your information architecture and leverages your existing taxonomy for browsing, filtering, refining and exploring.

The power of these tools is remarkable -- they not only avoid pulling teeth, they help you create structure in an unstructured world.

Title image by Partha S. Sahana (Flickr) via a CC BY 2.0 license

About the author

Joel Oleson

Joel was a key player in the launch of Microsoft's collaboration products from 2001 to 2008. During the launch of Microsoft’s first portal and collaboration solutions, Joel was the first dedicated SharePoint Admin for Microsoft IT internal deployments of Tahoe and Office Web Server, and later acted as Architect for the first version of SharePoint Online.