The problem with automated sentiment analysis

Share Button

social-media-monitoring-toolsSentiment analysis is a complex beast. Even for humans. Consider this statement: “The hotel room is on the ground floor right by the reception”. Is that neutral, or is it positive or negative? Well the answer is probably that it is different things to different people. If you want a high room with a view away from the noise or reception the review is negative. If have mobility issues and need a room with easy access it is positive. And for many people it would just be information and so neutral. Sentiment analysis is difficult even in human analysts in ambiguous or more complex situations. For social media monitoring tools it is also complicated and not always as simple or as clear-cut as we might like or expect.

As part of our review of social media monitoring tools we compared their automated sentiment analysis with the findings of a human analyst, looking at seven of the leading social media monitoring tools – Alterian, Brandwatch, Biz360, Neilsen Buzzmetrics, Radian6Scoutlabs and Sysomos. And the outcome suggests that automated sentiment analysis cannot be trusted to accurately reflect and report on the sentiment of conversations online.

Understanding where automated sentiment analysis fails

On aggregate, automated sentiment analysis looks good with accuracy levels of between 70% and 80% which compares very favourably with the levels of accuracy we would expect from a human analyst. However this masks what is really going on here. In our test case on the Starbucks brand, approximately 80% of all comments we found were neutral in nature. They were mere statements of fact or information, not expressing either positivity or negativity. This volume is common to many brands and terms we have analysed we would typically expect that the majority of discussions online are neutral. These discussions are typically of less interest to a brand that wants to make a decision or perform an action on the basis of what is being said online. For brands the positive and negative conversations are of most importance and it is here that automated sentiment analysis really fails.

No tool consistently distinguishes between positive and negative conversations

When you remove the neutral statements, automated tools typically analyse sentiment incorrectly. In our tests when comparing with a human analyst, the tools were typically about 30% accurate at deciding if a statement was positive or negative. In one case the accuracy was as low as 7% and the best tool was still only 48% accurate when compared to a human. For any brand looking to use social media monitoring to help them interact with and respond to positive or negative comments this is disastrous. More often than not, a positive comment will be classified as negative or vice-versa. In fact no tool managed to get all the positive statements correctly classified. And no tool got all the negative statements right either.

Why this failing matters to brands

This real failing of automated sentiment analysis can cause real problems for brands, especially if they are basing any internal workflow or processes on the basis of your social media monitoring. For example, imagine that you send all your negative conversations to your Customer Care team to respond to where relevant. If two-thirds (or maybe more) of the ‘negative’ conversations sent over are actually positive then this process starts to break down. Perhaps more importantly, a lot of the negative conversations will never make it to the Customer Care team in the first place (having been incorrectly classified as positive). Unhappy customers don’t get routed to the right people and don’t get their problems dealt with. The complete opposite of why many of our clients want to use social media monitoring in the first place.

So what can we do

As with any test, our experiment with the Starbucks brand won’t necessarily reflect findings for every brand and term monitored online. Our test was for a relatively short time period and we only put a randomised, but relatively representative, sample of conversations through human analysis. However, even with these limitations, we were surprised by the very high level of inaccuracy shown by the social media monitoring tools investigated. For businesses looking to make decisions or perform actions on the basis of a conversation being positive or negative this is potentially quite dangerous.

Of course there is much that can be done here and over time the tools can be trained to learn and to improve how they assess conversations about a given brand. But the overall message remains: automated sentiment analysis fails in its role of helping brands to make real decisions and to react to conversations about it online.

Read the other posts from our social media monitoring review 2010.

Share Button
32 Responses
  • May 28, 2010

    In terms of how to pass the sentiment analysis results to customer support we have to remember that comments and feedback need to be addressed. Whether they pass through a positive/negative filter before is irrelevant. In fact, as your post suggests this is probably impossible in an efficient manner. Unfortunately customer support (or any equivalent department) has to handle such analysis basically manually.

    Dimitris May 28, 2010
  • May 28, 2010

    Matt, a good weigh up of the issues.

    I work with Idio ( who do awesome work with sentiment, involving a part mechanical, part human approach.

    Ask @andjdavies if you want more info.


    Scott Gould May 28, 2010
  • May 29, 2010

    “No tool consistently distinguishes between positive and negative conversations” … of the tools you evaluated.

    Sentiment Analysis is a very challenging aspect of social media analytics. Accuracy depends on the availability of massive amounts of data. When you have 5 billion conversations to learn from, like we do @SocialRadar, the accuracy of sentiment is striking.

    @JoeTierney May 29, 2010
  • Dirk Singer
    May 29, 2010

    Excellent point, which those of us who use off the shelf tools are all too familiar with. We use ScoutLabs at Rabbit as the dashboard worked best for us…but like the others the positive / negative scoring leaves something to be desired.

    It’s not even so much that many of the negatives are wrong, it’s that 90% are lumped into neutral when they very often are clearly positive or negative. We usually revert to manual scoring and in fact am looking at systems that just leave you to categorise every mention yourself.

    Dirk Singer May 29, 2010
  • May 30, 2010

    Let’s make it clear, the most advanced methods used in the academia fail to be fully accurate (and I personally doubt if it’s possible to do so), while being tremendously expensive from a computational standpoint. At the same time, the methods used across industrial applications end to be rather naive after all, thus with an accuracy level, let’s say, between 60% and 85%.
    This very fact however does not suggest that these tools fail in their role, to me at least. We are at the very beginning of an industry offering products to automate a labor intensive process that doesn’t scale up, so any departure from a random accuracy level is a systemic improvement, and I cannot but expect companies to pay for that.

    George Tziralis May 30, 2010
  • May 30, 2010

    Right now, automated sentiment technology isn’t perfect, which is why there seems to be so much talk about it. That said, it does a good job of assessing massive amounts of data in a short period of time – something that couldn’t be done manually. At the same time, the technology is improving so it should become more accurate.

    While the technology will get better, it can’t do the job alone. This is why there is a role for people to play, and how automated sentiment technology should allow for manual adjustments. This will create what I call the perfect marriage between man/woman and machines so that social media sentiment thrives.

    cheers, Mark

    Mark Evans
    Director of Communications
    Sysomos Inc.

    Mark Evans May 30, 2010
  • [...] Últimamente nos estamos acostumbrando a ver herramientas de monitorización de la opinión que analizan si esta es positiva, negativa o neutral. Se trata del Sentimental analysis y está claro que este es el camino pero también lo son las dificultades que presentan este tipo d análisis basados en interpretación semántica y en opinión subjetiva. Así lo analizan en freshnetworks. [...]

  • Jun 1, 2010

    i completely agree with Mark from Sysomos. How to achieve that perfect marriage though is another story and at the moment, you’ll have to contact the monitoring firms directly for the secrets!


    giles palmer Jun 1, 2010
  • Jun 1, 2010

    For our clients, human analysis has been a success. As soon as there is a problem they can let us know and we can fix it right away. Being able to identify sentiment at a detailed level within the article and then also at a global level allows for more comprehensive reports that detail what was exactly negative/positive and also why.

    Best, Michelle @Synthesio

    Michelle C Jun 1, 2010
  • Michael Hollon
    Jun 1, 2010

    Even if automated monitoring tools could magically become 100% accurate, they still fall short of being as useful as analysis done manually by actual people. That’s because consumer research needs to answer questions like Why, How, and What needs to be done. There are findings that tell you what the score of the game is and findings that help you call the plays during the game. I would not go to my client with results if all I had was the finding: x% say thumbs up and y% say thumbs down. Even if I was completely sure I had the percentages measured accurately, that doesn’t help my client make improvements. I humbly believe a researcher needs to have a more comprehensive view of business issues.

    Michael Hollon Jun 1, 2010
  • Jun 1, 2010

    Hi Matt,

    As already mentioned in other comments sentiment tools are continuously being improved and are getting better day by day.

    We @6consulting always recommend complementing sentiment analysis with the human element especially when business decisions are based on these reports.

    Olivia Landolt
    Marketing and Community Manager
    UK focused Radian6 partner

    Olivia Landolt Jun 1, 2010
  • Jun 1, 2010

    Dirk – thanks for the comment. I think your experience is common to many people. In essence a sensible application of automated tools and letting you go through the posts yourself. In fact I think there are many benefits from working through conversations manually – you learn an awful lot about your brand and your customers by doing so. Tools play a great role helping you to do this.

    Matt Rhodes Jun 1, 2010
  • Jun 1, 2010

    George – I agree that we are at the beginning and the developments that have been made so far are great. The danger are the number of brands (including many that we meet) that base their entire social media engagement on the output of these tools. Of course this is as much about business processes as it is about the tools themselves, but it is important to remember, and to be reminded, that automated tools will only get us so far

    Matt Rhodes Jun 1, 2010
  • Jun 1, 2010

    Mark / Giles / Olivia – Thanks for all your comments, really appreciate them. Tools are not yet perfect and things are improving all the time, something those of us in the industry using these tools really appreciate. One of the important things is to help brands using the tools recognise (and in some cases even remember) the important of the human analyst. The tools can help immensely turning what could be a huge job into a manageable and organised one, but at the end of the day a human analyst has a lot to add, and can of course gain much from reading, categorising and responding to these conversations.

    Matt Rhodes Jun 1, 2010
  • Jun 1, 2010

    I think there is a clue in the term ‘social media monitoring’. Here is a definition of ‘monitoring’ that I found online:

    “Monitoring is an intermittent (regular or irregular) series of observations in time, carried out to show the extent of compliance with a formulated standard or degree of deviation from an expected norm.”

    To me, as a market researcher, ‘monitoring’, as defined above, has limited value. It might indicate something has changed, but is unlikely to fully explain WHY that change has happened. For me, to derive valuable insights and learning from online conversations, you need to mine and UNDERSTAND what is being said. From personal experience, deep understanding can only come from using human analysts. Automated tools can help find possibly relevant conversations, but you need humans to make proper and accurate sense of them.

    I also think that clients can be fooled by smart looking dashboards offered by social media monitoring companies, along with claims that thousands of sources and millions of conversations are trawled. This is all well and good, but is of little value if you don’t make proper sense of it. To that end, I agree wholeheartedly with Matt above when he says ‘I think there are many benefits from working through conversations manually – you learn an awful lot about your brand and your customers by doing so.’ You read, you understand, you learn. It’s more human effort, but it can be very rewarding.

    I’m sure the automated tools will get better over time, but as Matt’s tests show, they have quite some way to go in some areas, particularly sentiment.

    WaveMetrix, my previous employer, has five compelling reasons as to why human analysts are good at UNDERSTANDING buzz…


    Virtual Surveys

    Jon Beaumont Jun 1, 2010
  • Katie Harris
    Jun 1, 2010

    Hi Matt

    This is a really interesting post – the utility of automated SMM output, in a research context, is something I’ve often wondered about…



    Katie Harris Jun 1, 2010
  • Furuno GP150 GPS WAAS and Furuno GP150D DGPS WAAS Receiver and Display
    Jun 2, 2010

    [...] The problem with automated sentiment analysis | FreshNetworks social media blog [...]

    Furuno GP150 GPS WAAS and Furuno GP150D DGPS WAAS Receiver and Display Jun 2, 2010
  • Chris Nicholls
    Jun 2, 2010

    One of the issues of this field is that identifying the sentiment in text is in itself a subjective task. No sentiment analysis tool can be “perfect”.

    I am wondering if you compared the agreement of the sentiment labels assigned by two or more human readers? If you did, I suspect the “accuracy” would only be 80-85%.

    Would you consider releasing your labelled test cases. This data could be invaluable to the sentiment analysis research community.


    Chris Nicholls Jun 2, 2010
  • Getting serious about social media | SkiddMark
    Jun 7, 2010

    [...] oil… Anyone telling you that it’s possible to predict the future based on the past will come particularly unstuck when applying such potions to social [...]

    Getting serious about social media | SkiddMark Jun 7, 2010
  • [...] an automated analysis has been set up, you may want to use humans to check up on its accuracy. One analysis of automated classification suggests that automated analysis alone is often inaccurate in classifying terms. In order to [...]

  • Jul 5, 2010

    SMM Tools are daily getting better and better – now we are able to give a statistics driven overview based on the law of large numbers. To improve results you need linguistics professionals who are able to generate rules in a structured way, you need the specific milieu based way of speech and it’s finally an iterating process.

    Martin Krotki Jul 5, 2010
  • Valery
    Jul 22, 2010

    Matt – great concise article – thanks!

    You are spot-on re accuracy numbers applied on the entire set (including neutrals) are totally misleading. In addition to that, no metrics I’ve seen to date mention recall – what % of true positive / negative messages the engine picks up. Can be really important if what you’re trying to do is to locate and respond to every possible negative message about your brand…

    I see an evolution of the tools into something that allows people to dial in / out the line between accuracy and recall – the fundamental trade-off in all these algorithms. For a threat tracking app, low accuracy might be ok (we’ll through humans into the mix to fix that) but recall has to be high. For a major KPI / metric based on % positive / negative, low recall might be ok but accuracy has to be high. Etc.

    As someone else mentioned, would be great to get ahold of your labeled test set. Perhaps we can connect offline?


    Valery Jul 22, 2010
  • Aug 31, 2010

    Truth is, you cannot completely remove humans from the equation simply because conversations are essentially human, made up of language, emotion, and opinion. It may be possible in the future to get to 99.9% regardless of situation, language, etc. But in the meantime, human sentiment analysis is a good option to have.

    Infinit-O Aug 31, 2010
  • Oct 8, 2010

    I came across this interesting post just now – probably a tad late to comment :), but anyway. Sentiment analysis is an inherently complex NLP-based effort. However, people want to know not only where the sentiment is moving, but also why. Identifying the key drivers of the sentiment requires semantic analysis. An example of such combined tool is

    keywitness Oct 8, 2010
  • Reflections on social strategies in b2b | PeerIndex Blog
    Nov 4, 2010

    [...] sentiment analysis doesn’t work (as other agencies have discovered). In the case of this test, Metrica noted that one of the companies accurately gauged sentiment 29% [...]

    Reflections on social strategies in b2b | PeerIndex Blog Nov 4, 2010
  • The problem with automated sentiment analysis « Wordpress Test
    Dec 3, 2010

    [...] the social web to understand and react to what their customers are saying about their brand. This post looks at the failings of social media monitoring tools and why sentiment analysis cannot be trusted [...]

    The problem with automated sentiment analysis « Wordpress Test Dec 3, 2010
  • [...] In general, these types of tools do a pretty good job, and are around 70-80% accurate.  This social media sentiment article is a good write-up on sentiment and sentiment accuracy using text-scrubbing [...]

  • [...] positive comments in a different way by a different set of people. We’ve written before about the problem with automated sentiment analysis and the best advice is to make sure that you keep a level of human involvement and analysis to make [...]

  • sean poulley
    Oct 21, 2011

    Have been researching social media analysis and social media monitoring and I think this is very astute. I do however thing that ListenLogic solves this problem

    sean poulley Oct 21, 2011
  • Cyborg Sentiment Analysis « NJWResearch
    Nov 7, 2011
    Cyborg Sentiment Analysis « NJWResearch Nov 7, 2011

Leave a Reply

Your email address will not be published. Required fields are marked *