The problem with automated sentiment analysis

social-media-monitoring-toolsSentiment analysis is a complex beast. Even for humans. Consider this statement: “The hotel room is on the ground floor right by the reception”. Is that neutral, or is it positive or negative? Well the answer is probably that it is different things to different people. If you want a high room with a view away from the noise or reception the review is negative. If have mobility issues and need a room with easy access it is positive. And for many people it would just be information and so neutral. Sentiment analysis is difficult even in human analysts in ambiguous or more complex situations. For social media monitoring tools it is also complicated and not always as simple or as clear-cut as we might like or expect.

As part of our review of social media monitoring tools we compared their automated sentiment analysis with the findings of a human analyst, looking at seven of the leading social media monitoring tools – Alterian, Brandwatch, Biz360, Neilsen Buzzmetrics, Radian6Scoutlabs and Sysomos. And the outcome suggests that automated sentiment analysis cannot be trusted to accurately reflect and report on the sentiment of conversations online.

Understanding where automated sentiment analysis fails

On aggregate, automated sentiment analysis looks good with accuracy levels of between 70% and 80% which compares very favourably with the levels of accuracy we would expect from a human analyst. However this masks what is really going on here. In our test case on the Starbucks brand, approximately 80% of all comments we found were neutral in nature. They were mere statements of fact or information, not expressing either positivity or negativity. This volume is common to many brands and terms we have analysed we would typically expect that the majority of discussions online are neutral. These discussions are typically of less interest to a brand that wants to make a decision or perform an action on the basis of what is being said online. For brands the positive and negative conversations are of most importance and it is here that automated sentiment analysis really fails.

No tool consistently distinguishes between positive and negative conversations

When you remove the neutral statements, automated tools typically analyse sentiment incorrectly. In our tests when comparing with a human analyst, the tools were typically about 30% accurate at deciding if a statement was positive or negative. In one case the accuracy was as low as 7% and the best tool was still only 48% accurate when compared to a human. For any brand looking to use social media monitoring to help them interact with and respond to positive or negative comments this is disastrous. More often than not, a positive comment will be classified as negative or vice-versa. In fact no tool managed to get all the positive statements correctly classified. And no tool got all the negative statements right either.

Why this failing matters to brands

This real failing of automated sentiment analysis can cause real problems for brands, especially if they are basing any internal workflow or processes on the basis of your social media monitoring. For example, imagine that you send all your negative conversations to your Customer Care team to respond to where relevant. If two-thirds (or maybe more) of the ‘negative’ conversations sent over are actually positive then this process starts to break down. Perhaps more importantly, a lot of the negative conversations will never make it to the Customer Care team in the first place (having been incorrectly classified as positive). Unhappy customers don’t get routed to the right people and don’t get their problems dealt with. The complete opposite of why many of our clients want to use social media monitoring in the first place.

So what can we do

As with any test, our experiment with the Starbucks brand won’t necessarily reflect findings for every brand and term monitored online. Our test was for a relatively short time period and we only put a randomised, but relatively representative, sample of conversations through human analysis. However, even with these limitations, we were surprised by the very high level of inaccuracy shown by the social media monitoring tools investigated. For businesses looking to make decisions or perform actions on the basis of a conversation being positive or negative this is potentially quite dangerous.

Of course there is much that can be done here and over time the tools can be trained to learn and to improve how they assess conversations about a given brand. But the overall message remains: automated sentiment analysis fails in its role of helping brands to make real decisions and to react to conversations about it online.

Read the other posts from our social media monitoring review 2010.