A Test: Which Vendor Wins at Sentiment Analysis?

Natural language processing and sentiment analysis have been popular artificial intelligence (AI) research topics for decades now.

Early sentiment analysis efforts were typically applied to significant bodies of text, like movie or book reviews. In today's intelligent digital workplaces, however, we are becoming entranced by the potential of AI chatbots and the use of AI to actively participate in human led conversations.

Assessing Conversational Threads

Short and sharp twitter like exchanges, however, provide much less material for AI engines to work with. This can result in unintended negative consequences, as the Microsoft Twitter chatbot found recently.

Perhaps we're still quite far from being able to welcome a chatbot into our day-to-day conversations, but would a less ambitious goal of assessing the sentiment contained in a discussion now be within AI’s grasp?

Using existing sentiment analysis techniques to assess conversational threads adds some significant additional challenges.

Firstly, the amount of text is usually much less and much more succinct. Secondly, here is more than one speaker, so there is likely to be a mix of sentiments being expressed. Finally, there is context between speakers as they interact, which also has to be considered.

Because my team and I at SWOOP deal in enterprise level conversations, our clients often raise the topic of sentiment analysis. In fact, it is something we have been monitoring for some time.

In this article, I'll share some of my initial findings from testing some of the sentiment analysis solutions, whose vendors have been brave or confident enough to offer an online evaluation facility.

Sentiment Analysis Evaluation Set Up

From my early experiments, it was clear that all of the offerings could characterize positive conversations reasonably well.

It is nice to know that our Enterprise Social Networking (ESN) is facilitating positively reinforcing and polite online conversations, but for many of our clients, it is the early detection of negative sentiments that would provide the most value.

My quest for some negative conversation threads drew a blank on our own ESN — we are just too nice to each other! I had, however, seen some great examples online during one country’s recent national election.

It didn’t take long to find a short conversational thread to use for my testing:

Eric: “arrogant cock. Fish outta water at the BI.

Jim: “You have spoken some true words El, few truer that those. X”

Mary: “I think I said worse to his face ... Oh Dear...”

Why did I choose this thread? Well firstly, I sense that most humans would have little trouble assessing the sentiment as totally negative.

Secondly, it contains colloquialisms, shorthand and misspellings that are typically found in online conversations between familiar participants.

Thirdly, I could see that there was a nuance here where Jim’s response to Eric was reinforcing the initial negative statement made by Eric — yet on its own, it is a positive statement.

My testing procedure, therefore, was to feed the whole thread into the different products and then retest them one statement at a time.

The Results

This is not an exhaustive market study. I only assessed those products that offered an immediate test facility. Here are the raw results, which I will discuss afterward:

Vendor	Full Thread	Eric	Jim	Mary
Lexalytics	Negative (-0.6 strength)	Negative (-0.6 strength)	Neutral (0.0 strength)	Neutral (0.0 strength)
Microsoft Cognitive Services	Neutral (50%)	Slightly Negative (45%)	Positive (84%)	Negative (21%)
ParallelDots	Positive (97%)	Positive (76%)	Positive (96%)	Negative (3%)
Selasdia	Negative	Negative	Neutral	Negative
SenticNet	Positive	Neutral	Positive	Positive
TextAnalysisOnline	Slightly Negative (-0.08)	Neutral (0.0)	Slightly Positive (0.07)	Negative (-0.4)
They Say	Negative (0.668)	Negative (0.82)	Positive (0.776)	Negative (0.948)
Twinword	Negative (-0.099)	Negative (-0.37)	Neutral (0.03)	Positive (0.105)
*Stanford University	Negative?	Neutral	Positive	Negative
* Non-vendor research institution

Assessment

Each of the vendors had different ways of assessing and valuing the degree of sentiment. Most vendors provided a scale between -1.0 and + 1.0, or 0 percent to 100 percent to rate their sentiment assessments.

In terms of negative sentiment, I would have to say that nearly all performed poorly compared to my human assessment. However, when the sentiment was positive, as evidenced by Jim’s statement, they did reasonably well. The main points from my assessment are:

Sentiment analysis works far better for positive, rather than negative sentiments
None of the vendors explicitly addressed conversational contexts i.e. recognizing that Jim’s statement was actually reinforcing a negative prior statement, and therefore, in reality, was negative
Aggregating separate statements into an overall assessment appears mostly, but not always, simplistically averaging across all statements i.e. no recognition of chat context (to be fair this would be a big challenge in itself)
‘Neutral’ zero weight scores appear to be given when “I’ve got no idea how to assess this”; and finally
Negative slang terms like Eric’s are not really recognized or rated, compared to say Mary’s statement, that while also negative, was rated much more negatively than Eric’s

Is There a Standout Winner?

I would hesitate to call an out and out winner, given the limited scope of my test and the mixed results achieved.

Also, no doubt, many vendors could improve their scores through tuning their solutions for conversational text, something we intend to explore further.

That said They Say does have some appeal. Not only did their assessment have the closest fit with my own assessment (accepting the shortcomings mentioned above), they had the most colorful way of communicating their results.

I have also included the results from AI Research pioneer Stanford University, not because of the actual result, but because of its approach that I feel has most potential to be extended to work better in the "Chat" context.

Stanford’s use of "Sentiment Trees" linking words and concepts could be reframed to also connect statements within a chat thread, providing the "missing link" for sentiment analysis effectiveness for conversations.

Learning Opportunities

WebinarJul 9, 2026 · 9:00 AM PDT

Why Some Dealers Are Pulling Ahead With AI

Prove the significant result not only in soccer

WebinarJul 14, 2026 · 9:00 AM PDT

Content Leaders Collective: Proving Content's Business Impact Starts With the Right CCMS

WebinarJul 22, 2026 · 11:00 AM PDT

Replacing Tasks, Not Roles: The Changing Nature of Contact Center Work

Birds sitting on a tree branch like a content team

WebinarJul 23, 2026 · 11:00 AM PDT

How Fast-Moving Content Teams Keep Up as Sites Grow

WebinarJul 30, 2026 · 11:00 AM PDT

From Automation to Intelligence: How Leading Teams Are Rethinking Operations

Promotional banner for CX Retail USA Exchange 2026, an invite-only customer experience and retail leadership conference in Atlanta on Sept. 14–15, 2026.

ConferenceSep 14, 2026 · 7:30 AM EDT

CX Retail Exchange USA Atlanta 2026

Gaylord Rockies Resort & Convention Center in Aurora, Colorado

ConferenceNov 4, 2026 · 9:00 AM MST

Gartner Customer Service & Support Conference Denver 2026

WebinarOn Demand

How Modern Marketing Is Exposing the Limits of Legacy CMS

Watch Now

View All

Some Final Remarks

Despite some of the shortcoming identified in this article, especially with respect to negative sentiment, virtually all vendors offer scales of negative to positiveness and API access to their software, making it easy to access the technology.

In most cases, identifying relative sentiment can suffice. I would recommend running some tests of your own to see whether the sentiments provided are discriminating enough for your own tastes.

fa-solid fa-hand-paper Learn how you can join our contributor community.

Assessing Conversational Threads

Sentiment Analysis Evaluation Set Up

The Results

Assessment

Is There a Standout Winner?

Some Final Remarks

About the Author