Search engines work well with sentences. They take them to pieces to build an index that gives users a reasonable chance of finding relevant information.
Unfortunately people use language in a very different way, and this presents a substantial challenge to effective search.
At the IntraTeam event in Copenhagen in March, many of the presentations from multinational companies in which users were free to choose the language they used for internal social networking impressed me. Doesn't everyone want to express social emotions in a familiar language?
But the presenters made no mention of problems searching this content. Outside the conference room, all of them admitted social language search posed a significant challenge.
Computational Social Linguistics
To understand the implications of the social language in search, a good place to start is a 2016 survey paper published in Computational Linguistics. Although an academic paper, it's an easy read and raises good points such as that the stemmers and parsers standard text language works with are far less effective with social.
Another challenge with the social use of language is the significant regional variations in many languages. Latin American Spanish is a good example, and Brazilian and European Portuguese differ not only in words, but also grammar. This can also lead to problems in entity extraction, as Cologne and Köln illustrate in German.
Speak, Read, Write and Understand
Another consideration are the diverse skill sets of employees in speaking, reading, writing and understanding a language through listening.
Someone who is a fluent speaker in a second language may still have problems writing a blog post in that language. They might use a semantically incorrect word that will be understood in a social context much more readily than by a search application.
American's usually use the term "to slate" in the context of setting an agenda, but in British usage, we often use "slate" to indicate a criticism of a performance or presentation. Both are correct as they come from different etymological foundations.
Sharing Knowledge and Expertise
The use of non-native language also has an implication on creating expertise profiles and sharing knowledge.
In the UK my professional qualifications of FRSC and FBCS are reasonably well recognized, but outside of the UK, they mean nothing. I have two different business cards, with these qualifications only on the UK card. When employees have to write out their expertise in a second (usually English) language, do they have the skills to write even a reasonably ‘accurate’ profile?
Sharing knowledge is also a problem. Results from a recent study of a Finnish company show the use of a non-native language can make knowledge sharing an ambiguous and costly process, eroding some of the benefits of knowledge sharing.
Indexing Discussion Threads
Social networking also raises the problem of how to handle discussion threads. For example, I might post that I have experience working in pharmaceutical companies. Someone else in my company might respond, "So do I."
How does this information get found when someone in my company is searching for people with expertise in the pharmaceutical sector? Will search results present this thread within a context that is understandable?
Implications for Search
When creating a search strategy that includes social networking, you must first decide whether to have a specific search application for social media. Such an application would include the language management modules and ranking options needed to produce effective search results, especially around collaboration and knowledge sharing.
If the answer is yes, your next consideration is how to integrate these search results into an enterprise-wide search application. If I search for "pharmaceutical projects" will I pick up the person who made the "So do I" comment above?
Deciding to create a combined index of both document and social text requires working through indexing, ranking and presentation implications in detail, taking into consideration queries run across test collections of related document and social material. It also has implications for crawling, because employees will assume (trust me, they will) that social content is being indexed in real time.
The Balance Is Changing
Companies now view having multiple social network applications as a benefit, and all I can say to that is no comment!
The challenges of searching and integrating the results of multiple social networks into a sensibly ranked presentation will only increase. Writing a strategy is a start. But it's only through testing your search application in practice (and for multinational firms that means every region with a distinct social language) will you know if you're delivering results that matter.