young man looking at his watch
Editorial

A Test: Which Vendor Wins at Sentiment Analysis?

6 minute read
Laurence Lock Lee avatar

Natural language processing and sentiment analysis have been popular artificial intelligence (AI) research topics for decades now.

Early sentiment analysis efforts were typically applied to significant bodies of text, like movie or book reviews. In today's intelligent digital workplaces, however, we are becoming entranced by the potential of AI chatbots and the use of AI to actively participate in human led conversations.

Assessing Conversational Threads

Short and sharp twitter like exchanges, however, provide much less material for AI engines to work with. This can result in unintended negative consequences, as the Microsoft Twitter chatbotfound recently.

Perhaps we're still quite far from being able to welcome a chatbot into our day-to-day conversations, but would a less ambitious goal of assessing the sentiment contained in a discussion now be within AI’s grasp?

Using existing sentiment analysis techniques to assess conversational threads adds some significant additional challenges.

Firstly, the amount of text is usually much less and much more succinct. Secondly, here is more than one speaker, so there is likely to be a mix of sentiments being expressed. Finally, there is context between speakers as they interact, which also has to be considered.

Because my team and I atSWOOPdeal in enterprise level conversations, our clients often raise the topic of sentiment analysis. In fact, it is something we have been monitoring for some time.

In this article, I'll share some of my initial findings from testing some of the sentiment analysis solutions, whose vendors have been brave or confident enough to offer an online evaluation facility.

Sentiment Analysis Evaluation Set Up

From my early experiments, it was clear that all of the offerings could characterize positive conversations reasonably well.

It is nice to know that our Enterprise Social Networking (ESN) is facilitating positively reinforcing and polite online conversations, but for many of our clients, it is the early detection of negative sentiments that would provide the most value.

My quest for some negative conversation threads drew a blank on our own ESN — we are just too nice to each other! I had, however, seen some great examples online during one country’s recent national election.

It didn’t take long to find a short conversational thread to use for my testing:

Eric:“arrogant cock. Fish outta water at the BI.

Jim:“You have spoken some true words El, few truer that those. X”

Mary:“I think I said worse to his face ... Oh Dear...”

Why did I choose this thread? Well firstly, I sense that most humans would have little trouble assessing the sentiment as totally negative.

Secondly, it contains colloquialisms, shorthand and misspellings that are typically found in online conversations between familiar participants.

Thirdly, I could see that there was a nuance here where Jim’s response to Eric was reinforcing the initial negative statement made by Eric — yet on its own, it is a positive statement.

My testing procedure, therefore, was to feed the whole thread into the different products and then retest them one statement at a time.

Learning Opportunities

The Results

This is not an exhaustive market study. I only assessed those products that offered an immediate test facility. Here are the raw results, which I will discuss afterward:

VendorFull ThreadEric JimMary
Lexalytics Negative  (-0.6 strength)Negative  (-0.6 strength)Neutral  (0.0 strength)Neutral  (0.0 strength)
Microsoft Cognitive Services Neutral (50%)Slightly Negative (45%)Positive (84%)Negative (21%)
ParallelDots Positive (97%)Positive (76%)Positive (96%)Negative (3%)
SelasdiaNegativeNegativeNeutralNegative
SenticNet PositiveNeutralPositivePositive
TextAnalysisOnline Slightly Negative (-0.08)Neutral (0.0)Slightly Positive (0.07)Negative (-0.4)
They Say Negative (0.668)Negative (0.82)Positive (0.776)Negative (0.948)
Twinword Negative (-0.099)Negative (-0.37)Neutral (0.03)Positive (0.105)
*Stanford University Negative?NeutralPositiveNegative
* Non-vendor research institution

Assessment

Each of the vendors had different ways of assessing and valuing the degree of sentiment. Most vendors provided a scale between -1.0 and + 1.0, or 0 percent to 100 percent to rate their sentiment assessments.

In terms of negative sentiment, I would have to say that nearly all performed poorly compared to my human assessment. However, when the sentiment was positive, as evidenced by Jim’s statement, they did reasonably well. The main points from my assessment are:

  • Sentiment analysis works far better for positive, rather than negative sentiments
  • None of the vendors explicitly addressed conversational contexts i.e. recognizing that Jim’s statement was actually reinforcing a negative prior statement, and therefore, in reality, was negative
  • Aggregating separate statements into an overall assessment appears mostly, but not always, simplistically averaging across all statements i.e. no recognition of chat context (to be fair this would be a big challenge in itself)
  • ‘Neutral’ zero weight scores appear to be given when “I’ve got no idea how to assess this”; and finally
  • Negative slang terms like Eric’s are not really recognized or rated, compared to say Mary’s statement, that while also negative, was rated much more negatively than Eric’s

Is There a Standout Winner?

I would hesitate to call an out and out winner, given the limited scope of my test and the mixed results achieved.

Also, no doubt, many vendors could improve their scores through tuning their solutions for conversational text, something we intend to explore further.

That saidThey Saydoes have some appeal. Not only did their assessment have the closest fit with my own assessment (accepting the shortcomings mentioned above), they had the most colorful way of communicating their results.

'they say' results image

I have also included the results from AI Research pioneer Stanford University, not because of the actual result, but because of its approach that I feel has most potential to be extended to work better in the "Chat" context.

Stanford test results

Stanford’s use of "Sentiment Trees" linking words and concepts could be reframed to also connect statements within a chat thread, providing the "missing link" for sentiment analysis effectiveness for conversations.

Some Final Remarks

Despite some of the shortcoming identified in this article, especially with respect to negative sentiment, virtually all vendors offer scales of negative to positiveness and API access to their software, making it easy to access the technology.

In most cases, identifying relative sentiment can suffice. I would recommend running some tests of your own to see whether the sentiments provided are discriminating enough for your own tastes.

About the author

Laurence Lock Lee

Laurence Lock Lee is the co-founder and chief scientist at Swoop Analytics, a firm specializing in online social networking analytics. He previously held senior positions in research, management and technology consulting at BHP Billiton, Computer Sciences Corporation and Optimice.

About CMSWire

For nearly two decades CMSWire, produced by Simpler Media Group, has been the world's leading community of customer experience professionals.

.

Today the CMSWire community consists of over 5 million influential customer experience, digital experience and customer service leaders, the majority of whom are based in North America and employed by medium to large organizations. Our sister community, Reworked gathers the world's leading employee experience and digital workplace professionals.

Join the Community

Get the CMSWire Mobile App

Download App Store
Download google play