One of my favorite poems is “Naming of Parts,” written in 1942 by the English poet Henry Reed. Every time I hear it I immediately start thinking about taxonomies. Taxonomies can play a very important role in delivering high relevance results. 

As an illustration: you have been told to specify an aluminum bronze for a component in your production line. Aluminum bronze is not an aluminum alloy but a copper alloy with typically 8 to 11% aluminum along with nickel and iron. It might also be specified as BS2874 or CA104. Whatever the name, it is an example of a high tensile strength alloy, of which aluminum/zinc/copper alloys are another example.

Related Article: 7 Taxonomy Best Practices

Taxonomy 101

A taxonomy is a hierarchical array of related terms which can be used to widen or narrow a query term. In the example above, a search query about aluminium bronze might cause a response where the taxonomy offer options that include other copper alloys, or aluminum alloys, or high tensile strength alloys. Depending on the context, all could be valid responses.

In a world of standards it might be surprising to know there are no standards for taxonomy development. ISO 25964 is a standard for thesaurus development. Using it is one way of helping to take a standardized approach to a taxonomy. There are no right and wrong ways to develop a taxonomy. The core taxonomy team (and it will take a team) will be rich in business experience but even that experience can push a taxonomy in a certain direction. A metallurgist may talk about aluminum bronze but a product engineer may be looking for high tensile alloys.

Related Article: Taxonomy Governance: Why You Need It, How It's Done

Link Taxonomy to Broader Objectives

Start with an understanding of what a taxonomy will bring to the search application. Down the line, you'll also need to know how the taxonomy will be ingested into the search application and how you can make any changes once it's there. Taxonomy terms are often used to drive promoted content or to weight a ranking in favor of a specific query term or group of terms. Which means you must have a search application! Too many taxonomies are built as academic exercises.

Two big decisions you'll have to make when developing your taxonomy are how wide it should be in top-level terms, and how deep. There are no easy answers to either of these, but reading Patrick Lambe's "Organising Knowledge" and Heather Hedden's "The Accidental Taxonomist" are essential in the design process. Attending one of the annual Taxonomy Bootcamp events in London and Washington might also help. 

When in doubt, make sure you are keeping a balance between the development effort and the user benefit. This requires a strong understanding of what, why and how users are searching — a level of understanding that is unfortunately missing in organizations of all sizes.

Related Article: Search Won't Improve Until We Understand Why Users Search, Not Just How

Testing the Taxonomy

Testing a taxonomy involves many of the same techniques that are used in information architecture development for websites and intranets. Lambe's book includes a useful list of tests, found below. 

Learning Opportunities

The standard measures of quality are:

  • Intuitive — easy to navigate and use.
  • Unambiguous — does not offer alternatives.
  • Hospitable — can accommodate all content.
  • Consistent and predictable — provides context.
  • Relevant — reflects user perspectives.
  • Parsimonious — no redundancy or repetition.
  • Meaningful — provides context.
  • Durable — will not need constant change and checking.
  • Balanced — even levels of detail and depth.

To these I would add "equivalence" for situations where the taxonomy is being developed in more than one language. The dictionary definition may not be the same as on-site usage.

A common mistake is to test only the final version once it's integrated into the search application. Even with a very detailed knowledge of the ranking models (static and dynamic) it may not be possible to decide whether the taxonomy is meeting your objectives. This is especially the case in phrases such as "aluminium bronze" and "high tensile alloys" where the unpredictable Boolean treatment of these phrases can be very difficult to unravel.

Related Article: When it Comes to Intelligent Search, Don't Expect Magic

Crowdsourcing Terms

Taxonomies often start with running content through a text analysis application to generate an initial list of terms. This will pick up the terms in a document but will fail to support people asking for a “twenty eight seventy four alloy” (see above). This is also where the use of chatbots can be a challenge. Once the taxonomy is in service it can be valuable to create a form to collect new terms, but this requires the support to respond and revise. Otherwise expect an initial flood of terms, no reaction from the search team and then a precipitous decline in user suggestions.

Related Article: How to Not Fail at Taxonomy

Next-Generation Taxonomies

Knowledge graphs and other semantic technologies are now being widely used to create what might be regarded as next-generation taxonomies. The danger here is the more distant the relationship, the more likely the linking logic has made a non-logical jump. My advice would be to start small but design for expansion, ensure the business is closely involved, test and test again, and above all have a very clear objective for the benefits that will be brought to the employee desktop.