Semantic search has been the new black in the high fashion of content management and the industries around it. Nstein (news, site), a provider of Web CMS, DAM and text-mining technologies, just released a new product -- which they say is more flexible, intuitive and extensible than Google Search Appliance -- called Semantic Site Search, or the “new kind of site search,” as the vendor humbly refers to it.
We had a chance to get an early demo and talk to Eric Williams, Nstein’s product manager, who told us all about his "little baby" under the code-name 3S. It may be a newborn, but it comes with a strong featureset of multi-index federated search, embedded Text Mining Engine, semantic widgets and a more flexible presentation layer.
Search is Not Easy, or Can Google Save Us?
Let’s start with the fact that despite all the current technologies, search is still a pain. Despite all the tagging with keywords and metadata, we still don’t always get what we want -- relevant results.
Nstein seemed to have put a lot of thought into this problem and came up with 3S (Semantic Site Search) that uses their proprietary search algorithms and their text-mining technology to raise to the challenge of actually finding the data users are looking for.
Not without a bit of rivalry, Nstein makes it a point to remind us that GSA lacks some of the features that they provide with 3S.
Some would probably agree that tweaking GSA algorithms and settings can be a daunting task with too many parameters out there, yet not much freedom to do modification or be able to build your own features on top of it (Unless, of course, you buy one of those Google’s own add-ons like “Did you mean…”).
How Nstein 3S Works
3S is a combination of multi-index federated search engine and a presentation framework (web app built on top of the engine). It also uses embedded Nstein TME for semantic facets.
3S indexes content from various content properties (blogs, CMS, websites). Then TME (read more about Nstein’s Text Mining Engine) applies its semantical analysis and categorization powers to enrich the data, annotate it and expose the underlying (semantical and metadatical) meanings of content.
Using 3S, you can visually tweak results (starting with the proprietary algorithms that Nstein built) in a rather intuitive UI to get to the ultimate goal of getting relevant, faceted search results for your business requirements.
On top of that, 3S provides tools to embed advertising campaigns and add semantic widgets.
Semantic Widgets and Mashups
3S (they say, out of the box) comes with a widget server and several sample widget implementations. Widgets can be added to any of your web pages to display filtered relevant or related content from pretty much any source. Widgets are controlled by the Widget server and widget themes are then applied in this case.
Nstein 3S Sample Widget
Themes are not difficult to create on your own, according to Nstein, which means more extensive widgets can be created by just stripping down a theme on the presentation framework. In most cases, it’s not more than a line of code:
Code Snippet: Just One Line of Code
You can also potentially plug in widgets into external blogs, for example, invoking widget server remotely, which will make it build a cross-linking widget.
It can get even more interesting if you use widgets in a combination with the Template Engine to build search-based mashups based on various contents across different properties.
Themes and the Template Engine
3S comes with several themes that can be used for both controlling the presentation of your search results page. When search results are sent to the presentation framework, the themes are applied and you get a final web page as a result.
Using different “themes,” you can also create microsites for clusters of similarly-themed content and then deploy those sites in a quicker than usual fashion. Developing a topical microsite with 3S is essentially as simple as performing a search, says Nstein.
There’re several samples available out of the box, including:
- Guided search: with AJAX facets and filtering; if you remove facets, you can develop your own keyword taxonomy.
Nstein 3S Guided Search
- Topic pages: with automatic search‐based and metadata‐driven content pages. One of the uses here is to apply a topical page theme in order to dynamically generate a landing page for a specific user, using 3S and based on keywords. Nstein says, these pages would require no editorial maintenance.
Nstein 3S Topic Page (based on q=administration AND fq=person:Barack+Obama AND fq=person:Hillary+Clinton query)
Putting all that functionality above together, let’s take a look at the architecture first before we can start determining whether Nstein’s 3S can kick Google Search Appliance’s butt on the site search field. Yes, 3S is a site search engine and not to be confused with a web search engine.
Nstein 3S Architecture
This part of 3S processes and transforms content into a proper form to be ingested by the search engine. The Broker can ingest in two ways:
- A RESTful interface for synchronous ingestion
- A hot folder mechanism for asynchronous batch processing.
SSE (Semantic Search Engine)
This component is in the center of the system and combines retrieval technology, Nstein Text Mining Engine and custom algorithms.
Here we have scripts and scriplets, search apps – all working at delivering large volumes of query results through a presentation layer, using an API and template system.
Semantic Widget Server
All data and metadata stored in the search indices is exposed via the Semantic Widget Server.
This component can also be used for deliver content remotely, for example, through your classical search widgets on external sites.
This is an administrative tool with its front-end layer driven by PHP. This is where you would tweak Java-based search algorithms, do index management (force, edit results), design contextual advertising and campaign management (via a tie with the ad server) and do general maintenance.
Clustering and Scaling 3S
One can be creative with 3S, depending on business needs and deploy to a single-box or a clustered environment.
In a single box scenario, a component waits for content to be sent to it, looks at the config when it receives content and runs a transformer to then send it to the Semantic Search Engine for indexing.
Nstein 3S Single Box Deployment
With high traffic but limited budget, organizations can look at a mid-range configuration where you still have one master (with Presentation Framework, Semantic Search Engine and other components on one box) and a failover machine.
In more complex enterprise deployments, the Presentation Framework and Semantic Search Engine are totally detached, and you can have any number of slaves based on one or multiple masters. In the middle layer, you would stage new algorithms to test their behavior without any impact to production (the top two boxes in the picture below).
Nstein 3S Enterprise Deployment
Clustered production boxes can be set to force replicate, or replicate based in certain time periods, which can be set up/triggered from the Backoffice.
While it looks like a good effort on Nstein’s part, would 3S be something you’re willing to use in your organization and in a combination with your current Web CMS? How would you fit it in your existing content management architecture with its mix of technologies?
According to Nstein, the goal of 3S is to be able to install and get it up and running in 3-4 hours (at minimum, implementation time can go up depending on a variety of factors). Would you take them up on this statement and forego the GSA?