organic logo

September 2nd, 2009

Five Reasons Sentiment Analysis Won’t Ever Be Enough

robotemotions.jpg
Why is it that the social monitoring vendors that support NLP (natural language processing) for sentiment scoring will go on and on about their 80% and up accuracy? And yet, the vendors that don’t offer NLP and opt only for human analysis will tell you that sentiment analysis can not and WILL NEVER be accurate…
A recent article in the NYTimes Mining the Web for Feelings, Not Facts called attention to both the powerful insights and innate troubles that come with such tools. And as with all NYTimes articles that explore some aspect of our business, it has meant a lot of excited chatter in the industry blogs about ways sentiment analysis is ramping up, and questions from detractors about whether or not we can ever overcome some of the basic flaws of the system.
Here is what I think: sentiment analysis won’t ever be enough, and not because of sarcasm or industry specific slang, but because we are measuring the WRONG thing. It’s about the effect, not the content of the message What you really want to measure is not whether a message is positive or negative, but what influence it has on the people who read it. We spend so much time worried about the mindset of the vocal few that we ignore how their message really changes the decisions of the many.
We need to understand:
1. The human language is complex, but so are people
2. A positive plus a negative does not mean neutral
3. Analysis doesn’t consider “degree” of sentiment
4. Sentiment makes no room for personal authority
5. Sentiment does not indicate action


1. The human language is complex, but so are people
Just because we hear someone talking negatively about a product, doesn’t mean that we are less likely to buy it. There is a lot more to it than that. It’s about who says it, what they say, our mindset at the time, where we are viewing it, and what we take away from it.
2. A positive plus a negative does not mean neutral
If we look purely at the content of a message from a scientific basis, it would seem that a review of a product that posts both negative and positive attributes would net out as a neutral review. However, if we look at the effect of a message, I might be more likely to take the negative aspects of a review MORE seriously because the author was able to acknowledge both the pros and the cons of the product.
3. Analysis doesn’t consider “degree” of sentiment
Every product is different, and understanding what a person is looking for in that product can help prioritize the sentiment. A negative comment about the look of a phone might not have the same power as a negative comment about the battery life. And at the same time the difference between “only 1 hour battery life!!” and “slightly disappointing batter life” is huge.
4. Sentiment makes no room for personal authority
Many analysis vendors can provide an authority ranking on a particular blog or forum that is based on things like reach, traffic, and interaction, but it doesn’t really rate the authority the individual poster or forum member has on their audience (similar to PageRank for People). The always angry forum troll is not going to have as powerful an effect with their negative sentiment, as the mildly disappointed and very respected super fan.
5. Sentiment does not indicate action
Finally, the sentiment of a posting in a forum or a blog, be it positive or negative can’t even be trusted to indicate the ultimate action of the original poster. Someone might be reviewing an awesome flat-screen television set they would never be able to afford to buy. Or someone could be disappointed with the newest Apple product, but purchase it nonetheless.
In conclusion, I have faith that with the amazing steps forward in natural language processing that we can get to the place where the way the content of the message is associated with a positive or negative score, even down to a sentence by sentence level. However, the bigger question is… What are we really trying to decide with this data? And are we asking the right questions?
Marta Strickland

13 icon: comments 0 icon: connections + Share
  • Jacinta says:

    I have to agree with you Marta, it is definitely about the effect and that is hard to measure. I bet there are some software companies out there trying to come up with an algorithm that takes into consideration personal authority.

  • Justin Powell says:

    Excellent analysis Marta, thanks for sharing.
    2. Is so true with regard to trust and authenticity – just because there’s a ‘con’ statement doesn’t necessarily result in a negative impact!
    There are interesting findings around group polarization (A study that groups tend to take more extreme positions than individuals). Wikipedia contests that even more extreme cases happen online. And yet in a study by Dr. Bernardo Huberman at HP Labs (9:50) has found the opposite to be true – that product reviews (positive and negative) have been shown to collectively share a neutralized opinion VS a more extreme position. This happens because if the visitor generally agrees with the review they will not comment. BUT, if they extremely agree or extremely disagree they will be provoked to comment.

  • Jon Reid says:

    100% right!
    Like Luntz, Malanksy Strategic Research’s tagline:
    It’s not what you say, it’s what they hear.

  • dominiq says:

    Thanks for this excellent article. I agree 100%.
    If I may add:
    1- As you pointed out Sentiment is not discrete, it’s a continuum. And you’ve different degree in negative sentiment.
    So people have some to make decision upfront on how they treat mildly negatives, negatives with facts, strong negative, or even legal threat.
    2- There is “social” in social media and when consumers say they want brands to participate, I doubt that they want their messages to be read by an algorithm. There is no shortcut.
    Either you’re developing your network and social relations or not.
    3- The real paradigm shift that we see is the rise of communities. (by the way they span different infrastructures but this is another topic).
    And in communities, people tend to communicate not only to be understood but also to be differentiated. That’s a core function of tribes (cf Seth Godin).
    So the challenge of understanding context is growing faster than NLP is making progress. In some communities (most?) the English dictionary is obsolete. Look at the paroxism with lolcats.
    Best and thanks for the post

  • Laurent says:

    I just had the same points on my own blogs something like…2 mminutes ago ! ;)

  • dominiq says:

    I wrote one post relative to NLP versus the growing complexity of the social web:
    http://blog.ecairn.com/2009/06/18/semantic-web-lolcat-and-japan/
    I don’t know whether it’s a “synchronicity” but it looks like we both used cats to make our point.
    We often use the pet owner’s community as data point and source of inspiration.
    Best

  • Jordan Julien says:

    I think your parting question nails the underlying issue: “What are we really trying to decide with this data?”
    If this is another vanity metric, there are others that provide more value and are readily avialbe.
    If it’s going to influence business decisions & spearhead change, then we need a deep & complete analysis.
    I think most sentiment analysis formulas average out valuable insights business leads can observe if you just give them top-line stats. (i.e. # of conversations, conversation time line, common-words in conversations… etc)

  • Urs E. Gattiker says:

    Marta:
    The points you raise are very important indeed. But as we reported this week conducting several tests using sentiment analysis tools, they fail to pass the benchmark.
    We present the findings here:
    Sentiment analysis for online content: honest?
    Thanks for this insightful post. Urs

  • Esteban Kolsky says:

    Great Post!
    It is getting old to try to guess what customers want or feel, and then try to bottle that into some sort of measure. I am not sure who gets really fooled into believing that computers (even analysts) can take one instance in time of one interaction and draw accurate conclusions about sentiments, feelings, and thoughts.
    The only way to measure any of those is to ask. However, since people tend to “change” their feelings for surveys, and even change them over time, isn’t it better to leave sentiments alone and just focus on what matters: delivering effectively to the customers agreement and expectations.
    If you measure expectations before the interaction, delivery effectiveness, and expectations and content after the interaction you will have a lot more “logical” data to follow than if you try to understand why someone is sad or happy — often unrelated to your delivery.
    Great post…thanks.

  • Doug Warner says:

    Good post, you make some good points. However, you are looking at only a limited set of areas where sentiment analysis is used. It is not all about individual review of product purchases or forum posts! Sentiment analysis is also useful in customer support or reviewing twitter for various issues.
    I think there are some situations where automated sentiment analysis is quite useful. Consider that often someone has to filter through massive amounts of information. If that information has an approximate guess for sentiment then the reviewer can optimize (or prioritize) their search of the information. Think: sentiment triggered workflow.
    You note that there are other measures that should be considered (such as authority) and you are spot on. The more automated techniques to bear on the signal to noise problem, the better! The only reason any of these techniques are useful is to speed the human processing of massive amounts of data–it is not to replace the human analysis.
    In considering your premise, “Sentiment analysis won’t ever be enough. … Its about the effect, not the content.” imagine you are hired to review 1000 tweets to find any people who are trashing your brand or product. You have 1 hour to identify as many of them as you can and summarize the gestalt and respond to the 20 “most appropriate” tweets to mitigate the problem. Wouldn’t sentiment analysis sure help with that task? Of course it would also help to combine it with authority scores.
    Maybe your analysis would not be perfect in the end compared to spending a day or a week and reading all of the tweets, but in a fast paced business where your results are judged against the clock as well as against a quality measure, any reasonable tool set helps with your prioritization triage. In this case the “effect” of the message is ignored unless there is some way (such as sentiment analysis) to bring it to the attention of the person who needs to attend to it!

  • Alan Wolk says:

    I think (I hope?) that most people are using automated sentiment the way I do: as a very top-level pulse check on topics where there’s a lot of sentiment (e.g. there’s been a lot of chatter and so you’re dealing with a fairly large sample base.)
    So to use the Apple product example from above, if one of the myriad automated sentiment guides says that 83% of the commentary is positive, then we can postulate that there aren’t any serious flaws or consumer issues to date that require immediate action.
    Beyond that, it’s risky for exactly the reasons you note. Nicely done analysis.

  • Steve Wright says:

    Granted, automated sentiment analysis is not perfect. But does anyone know how not-perfect? I would really like to see some test results of measured sentiment and real live outcomes.
    For example does anyone have a feel for the correlation between say a 75% favorable sentiment versus a 25% score to consumer behavior? I’m sure the correlation is not one. But I’m pretty sure it is not zero. And if the correlation is somewhere above zero, then there does exist some kind of value. Just how we use that value is the key.

  • Ken Burbary says:

    Marta,
    I concur with the premise of the post, for both the same and different reasons. Sentiment analysis can be useful, but what good is knowing the percent of +/- conversation in a vacuum? Relying on this data out of context is very limiting. 80% are positive and 20% are negative? Ok, so what. How is an organization supposed to act on that?
    Monitoring can reveal tremendous insights, but the ones that are useful are those that can yield actionable outcomes. Sentiment analysis is but a single variable in this equation. Combining sentiment with other variables like conversation types and event triggers, for example, is necessary to reveal the insights that truly matter, which types of convo produces the most volume and thus need to be higher priority, and can be used to begin a customer touch-point analysis for improving the interaction between customers and your company representatives.

  • Six Pieces about Sentiment Analysis | Delib Blog says:

    [...] has a post on these issues from September 2009. Focussing on product reviews, she identifies Five Reasons Sentiment Analysis Won’t Ever Be Enough, and concludes “What are we really trying to decide with this data? And are we asking the [...]

  • Add to the Conversation