Text analytics and text mining have until recently remained on the periphery of my interest in enterprise search.
But my consulting work called on me to become instantly familiar with text mining technology — with the expectation that I would not only gain an in-depth understanding, but also the ability to forecast the future.
Text Mining's Come a Long Way
Marti Hearst authored a research paper in 1999 that brought text mining to the forefront for many. In it, Hearst commented that, “The nascent field of text data mining (TDM) has the peculiar distinction of having a name and a fair amount of hype but as yet almost no practitioners.”
The summary offered another memorable statement: “For almost a decade the computational linguistics community has viewed large text collections as a resource to be tapped in order to produce better text analysis algorithms. In this paper, I have attempted to suggest a new emphasis: the use of large online text collections to discover new facts and trends about the world itself.”
Text mining has thankfully progressed since 1999.
In the last five years in particular, text mining's value has become very evident, especially in the analysis of social media and in the bioinformatics area. By my count, the market boasts at least 60 active and visible vendors — a commercial community larger than that for enterprise search. Stephen Arnold summarized the current position of some of the leading vendors in an article that examines 30 vendors.
Reading Into Deep Text
Up until very recently, gaining an understanding of the technology and applications behind text mining involved reading highly technical books with a strong academic focus. Alta Plana founder Seth Grimes, who has championed all aspects of text analytics for over a decade, has been the exception.
Playing a less visible role is KAPS Group founder Tom Reamy, who recently published the very informative book, "Deep Text." In the book, Reamy highlighted what text mining can deliver with just enough technology to understand that it is not magic. Although the book runs over 400 pages, I read it cover to cover because of the conversational style and deep insight with which it is written. Consider this essential reading, even for search managers.
Mixing Up Search and Mining
Reamy covers the potential influence of text mining on enterprise search well. While complementary, some important differences separate the two.
For example, text mining is usually applied to a static collection of perhaps several million documents, so the addition of more recently published research papers will not dramatically change the outcomes. The expectation with enterprise search is that even the most recent documents will be indexed in seconds.
Learning Opportunities
Historically, vendors usually fell into one area or the other — either search or text mining. Attivio has been at the forefront of offering a search/analytics platform, where it is now joined by Sinequa, which now positions itself as offering a cognitive insight platform. In effect, Sinequa is combining its enterprise search technology with a substantial amount of natural language processing and analytics technologies — not quite in the text mining business, but getting very close.
The advent of such tools is not to suggest, as many have, that enterprise search is broken and therefore unworthy of investment. The root of such complaints lie not in the investment in search technology, but rather the lack of investment in the support team.
For those who find enterprise search a challenge, the computational linguistics and machine learning of text mining will be a quantum leap in some respects, but very familiar in others.
Reamy points out that a move into enterprise text analytics requires a multi-disciplinary team, a point, of course, that I have been advocating for years in enterprise search.
New Opportunities for Search Vendors?
It will be interesting to see how the vendor market reacts to the massive opportunities that a combination of search and analytics will offer.
My current assumption (come back in a year!) is that both commercial and open source enterprise search vendors will move to add analytics. The technology will not drive this turn, but rather the fact that text analytics vendors have more opportunities than they can currently cope with. Current estimates value the text analytic market at $3 billion (with forecasts that it will reach almost $6 billion by 2020). This opportunity may also give enterprise search vendors an edge over open source search vendors.
Anyone interested in search strategy would be wise to follow developments in this area.
Learn how you can join our contributor community.