plane hidden in the middle of a forest
PHOTO: David Kovalenko

Sometimes it feels as if two diametrically opposed camps exist in the enterprise search space: commercial and proprietary; and open source. I'd suggest this is true across many types of software.

Many — perhaps most — software products in use in organizations today are commercial and proprietary. Software vendors have traditionally worked this way: A company designs the product, their developers and authorized contractors are the only people with access to the source code, and only they and their authorized partners can sell or license the product. 

This traditional model is often called "closed source" because the company that creates, sells and supports the product owns the entire product. The software tends to be more expensive, because as we all recognize, commercial organizations have to make a profit, or they don’t stay around long.

A different model emerged in the '80s: open source software. From the start, the idea was to create software that anyone could download and use for free. On top of that, anyone could modify and use the source code and submit modifications and bug fixes back to the original project. Over time, several different types of open source licenses evolved, and a number of software products we use every day were created.  

If you work with computers today, chances are high you are using open source software. Different open source operating systems have become common: Unix, the proprietary AT&T product, became the model for several derivative open source operating systems including BSD and FreeBSD, and even Linux. And while Microsoft Windows is not an open source product — yet — Microsoft does make the source code available to government agencies, universities and occasionally, corporations. 

A New Model Emerges: Commercial on the Outside, Open Source on the Inside

In the early days of the web, a group known initially as the Apache Group, now the Apache Software Foundation, was developing the first free and open source web server. The organization has since expanded into many other projects. Because many of the projects are assigned names of animals — from Ant to Zookeeper — they are often collectively known as the "Apache Zoo" as I wrote back in 2015. 

But in recent years, we've seen a new trend. The enterprise search platform has turned its focus to the user and administrator experience, making the underlying tools less important. This is true even in commercial products. Many companies with "closed source" products have jumped on the open source bandwagon. Behind the scenes, most commercial search platforms are now using open source tools like Lucene, Solr, Spark, Hadoop, Mahout, Nutch and more.

If you use enterprise search today, especially if you use capabilities like machine learning, you are likely using Apache software. Lucene, the software that drives both Solr and Elastic, is an Apache project. Are you considering machine learning or "AI" for your ecommerce? You'll probably be using open source Apache Spark or Apache Mahout. And this is true whether the product you use is free and open source, or a commercial product.

Enterprise search vendors are increasingly integrating open source tools under the covers. This shift has taken place both because of the quality of Apache Software Foundation projects; and because the magic in the Apache projects tools is hard to master. Enterprise search software solves real world problems in and around finding content anywhere in the organization. Enterprise search is not the kernel: it is the user and administrative interfaces; the crawler; and the indexing and reporting tools. These elements comprise the proprietary application, because this stuff is easier to create. And many successful commercial search companies have realized it's easier — and safer — to use an Apache tool rather than invent their own proprietary solution.

Related Article: Open Source Search Goes Commercial

The Cost of Open Source

Just because a project is free to download and use, don’t assume implementing open source-based applications will be free. Like any significant software product, costs add up quickly: you'll need skilled developers or consultants, training, documentation and support. Fortunately, a number of search vendors provide just that, depending on the open source tool. And many of these vendors employ a significant number of the "committers" — the people who actually write the open source code. That also provides the vendors some flexibility in helping design what enhancement gets included in future releases of the application.

Related Article: Enterprise Search in 2018: What a Long Strange Trip It's Been

DIY Open Source Enterprise Search?

With an increasingly large number of enterprise search products based heavily on open source software that can be easily downloaded, the temptation might arise to do it yourself to save money.

While I know of several organizations that use both Lucene and Solr internally and externally, I'd generally suggest against it. Rather than license a fully supported product, you'll have to maintain experts in the search platform you decide to use; you’ll have to build up expertise not only in search but in indexing content; in implementing security; and you'll have to have staff keep up to date on the search platform you select. Yes, organizations have successfully done this, but for many organizations, a product with support is far preferable to DIY enterprise search.

Related Article: Enterprise Search in 2018 Opens Up New Opportunities