Search is an incredibly interesting problem, one that’s so complex in the background yet so simple on the surface. In this two part article, we examine the desire to duplicate the Google search experience in the enterprise and how we need to change what we expect from enterprise search based on what we’re willing to do to make it work.

What could be easier than entering a few keywords into a single text box and, in a fraction of a second, being granted access to tens or even hundreds of millions of relevant resources -- all the information we could ever really want right at the tips of our fingers. For most, this is perceived as a near perfect user experience that is today’s reality when we search online using Google.

Yet a common complaint heard internally across many organizations is the inability to easily find the right answer amongst a set of far fewer, and often less relevant results. The organization of content into intuitive information architectures is a challenging problem, and the creation of navigational constructs that classify information into meaningful categories is becoming increasingly difficult due to the sheer volume of content being produced.

The user experience is increasingly becoming both complicated and fragmented and is placing a greater emphasis on search as the silver bullet. Unfortunately, search too is failing to meet the needs of our users and is oftentimes perceived as nothing more than “a random document generator”, as one client has colorfully put it.

Why Can’t We Just Get Google?

Interestingly, this is a common question asked during many of our intranet redesign initiatives. We hear it from end-users on the frontlines all the way up to senior level management, executives and everywhere in between. But before we look for an answer, let’s take a step back and build a bit of a foundation by examining some of the fundamentals of web search itself.

As human visitors to a web page, we expect to see a variety of visual cues embedded within the interface. Graphic design, eye-catching imagery and the logical layout of content are all elements that appeal to us as we interact with a site and its content.

In their absence, the site’s ability to keep us engaged dramatically diminishes and we quickly lose interest. In contrast, the search engine’s experience of the same page is purely textual, ignoring most if not all of the parts that draw us in. To illustrate, let’s take a look at the difference between what a visitor sees versus what Google sees:

Visitor View

visitor-view.jpg

Google View

googlebot-view.jpg

Text only cached version of this page in Google

Google’s crawler, the Googlebot, has the primary function of finding, consuming and indexing content from across the web. But it doesn’t stop there. More attributes are taken into consideration in order to determine relevancy and display of the appropriate search results for a particular query.

A short video from Matt Cutts, Principal Engineer at Google, shows at a high level how the search engine locates, indexes and ranks web documents:

Essentially, Google traverses the internet by following links and consuming the content it finds along the way. It attempts to implicitly derive the meaning of a document based on the document’s content by examining terminology and, to put all that text into context, uses signals like the occurrence of words and the relative value of those occurrences, including positioning, weight and semantic relationships to infer relevancy. It does so by asking questions, “more than 200 of them”, of the document itself, as well as the document’s context within the larger corpus of indexed content (see How Google Search Works (1:16) video above)

The algorithms underlying the technology are comprised of complex mathematical formulae that have been continuously evolving over the past decade in an effort to solve problems specific to the indexation and ranking of web content. It has quite often been altered to reflect actions taken by unscrupulous webmasters who have attempted to game the system by exploiting unaddressed gaps in an effort to obtain high search rankings.

To improve both quality and relevance of the search results, more than 200 unique attributes are captured and applied to indexed documents. We can think about the automated application of each of these signals as Google’s approach to content enrichment.

While these attributes form the foundation of Google’s secret sauce, numerous experts in the industry have made attempts to determine what many of them actually are. Creating a text cloud of the results of their analysis offers the following high level insight:

google-seo-ranking-signals.jpg

Generated using http://www.seomoz.org/article/search-ranking-factors#ranking-factors & http://www.wordle.net

On the web, domain factors and keyword use in page text (properties) as well as in internal and external links comprise some of the more important attributes when it comes to assigning relevancy in internet search.

Editor's Note: Read more articles by Jeff Carr, starting with: SharePoint 2010: Using Taxonomy & Controlled Vocabulary for Content Enrichment

Ambiguity, Intent and Encouraging Conversation

If we take a look at another piece of the puzzle we see that further complicating the problem of search are the searchers themselves. Search queries are often ambiguous and generally do not express exactly what it is that the searcher is actually after.

Even the best search engines in the world, including Google, are unable to resolve intent based on the entry of a few keywords into a simple text box. Take for example the query term “twister”, which has roughly 550,000 searches per month (taken from Google AdWords Keyword Tool). It’s virtually impossible for the search engine to understand the intent of the searcher that enters this query. Is the person looking for information about…

  • The 1960's game from Milton-Bradley?
  • The 1996 movie starring Helen Hunt and Bill Paxton?
  • Helping to understand the scientific nature of tornadoes?
  • Maintenance tips for a Honda Twister 250 sport bike?
  • A promotion offered by the radio station KTST-FM 101.9 the Twister?
  • A tongue twister the searcher used to know as a kid but has since forgotten?
     

The primary challenge lies in the search engine’s ability to extract context based on innovative interaction. Different people entering the same query term might in fact be searching for different things altogether, so to arrive at an answer the search engine must begin a process of disambiguation. Sometimes it occurs during query construction through auto-complete or keyword suggestion:

disambiguation-using-auto-suggest.jpg

While other times it comes in the form of related searches:

disambiguation-through-related-searches.jpg
 

The concept of universal search is also used to integrate potentially relevant content from shopping, image, video, news and social sites like Twitter. If none of these approaches offer the insight required, the searcher often then refines the original query by entering additional keywords that provide further clarity. This back and forth interaction effectively becomes a conversation between the technology and the searcher - an iterative process required to connect the person searching with the most appropriate search result.

In addition, the simplicity of the Google Experience has also led to incredibly high expectation. The company’s philosophy on designing for web search takes into consideration the whole (Google Searchology 2009) problem of search, meaning that:

  • If users can’t spell, it’s our problem.
  • If they don’t know the syntax of search, it’s our problem.
  • If there is not enough content, it’s our problem.
  • If they can’t speak the language, it’s our problem.
  • If the web is too slow, it’s our problem.

We can learn a lot from these statements, but what it really boils down to is the improvement of information access through the mechanism of enterprise search within our organizations is our problem and not the problem of our users. In some ways this has made the design and implementation of enterprise search initiatives significantly more challenging. We often look for the easy solution since the Google Experience has taught us to expect simplicity.

But, we need to keep in mind that Google as a company employs thousands of very talented and intellectual people working toward solving the single problem of search. Admittedly, web search is a problem that is far from being solved, and even the world’s leading search company believes there’s still much work that needs to be done. In a recent blog post taken from the L.A. Times, Gabriel Stricker, Google’s Director of Global Communications and Public Affairs stated, "Search is at the heart of everything we do, and as we've said many times, it's still an unsolved challenge."

As purveyors of search technology and functionality for our organizations, we can no longer approach search as just an application that is plugged in and turned on. To be successful, we must take a look at the problem from a different perspective and begin to view at it as more of an experience, one that’s constantly evolving and is as unique as the people for whom it’s intended.

In part 2 of Enterprise Search and Pursuit of the Google Experience will examine shifting the perspective to the user experience and Jeff offers 6 steps to improve enterprise search.