10/30/2009

Five Stars, Zero Help

marioglowstar.jpgA couple of articles that came out recently - one from TechCrunch and another from MIT's Technology Review - sparked a lot of healthy debate here at Organic.

How useful and reliable are ratings and reviews? Do super users bunk up the system? Or are these systems simply set up to fail?

According to TechCrunch, five-star ratings systems (YouTube's was called out) are unreliable and inferior to other, better-defined/fewer-choices models like thumbs up/thumbs down or favoriting. The weakness of the five-star system was pegged to user subjectivity (But how can any ratings system get around that? Aren't we looking for others to share their own personal experiences and point of view in reviews?) and the tendency to vote only when one really loves (five stars!) or really hates (one star!) something.

And this is where the heated e-mail debate begins...

Fang-Yu Lin: "Yes, only people who love or hate a video enough would bother to rate it, hence the U shaped rating curve. However, one cannot simply extrapolate that to other 5-star rating systems. Amazon's customer reviews, for instance, seem more evenly distributed. My guess is that by requiring people to write a review along with the rating, herd behaviors are greatly reduced... The issue here is not the 5-star scale itself, but the entire system one designs around it."

Bridget McKinley: "But are Amazon's ratings and reviews really all that much more reliable? The MIT Technology Review article on recent Carnegie Mellon research indicates even the Amazon system has serious issues. The culpruit? A handful of superusers and their bias."

Fang-Yu Lin: "Curiously, many seem to accept the thesis that if some users rate more items than the majority of users, reviews would be skewed as a result. Is it really so? For every item, each user has only one vote. It doesn't matter if certain reviewers are way more active; on a given item he or she can cast one vote, just like everyone else."

Bridget McKinley: "Or course, you can argue that even though each user only votes once, the likelihood of encountering reviews from power users are higher on any given item... Could this be a key issue hindering the reliability of reviews? Well, maybe. Conventional wisdom (along with hard data from a variety of sources) indicates that fewer than 10% of users end up generating more than 90% of the content on sites with social media functionality (functionality like ratings and reviews). That can certainly lead to volatility and sway that may undermine the very point of providing that type of service on a given site in the first place."

Fang-Yu Lin: "There is no proof that these [Amazon] frequent reviewers effectively acted as a voting bloc and submitted unvaried reviews, thus making the point moot."

Bridget McKinley: "But even if they aren't acting in a bloc, they are significantly affecting the curve. Recent studies by Bazaarvoice, Keller Fay Group. and JupiterKagan have all concluded that positive reviews tend to outweigh negative ones by an overwhelming margin (the Bazaarvoice analysis finds a 8:1 disparity)."

Fang-Yu Lin: "Looking at the average Amazon rating by item [per the Carnegie Mellon research], yes, there is a tendency toward higher scores. However, this is to be expected for a shopping site: People buy things that they perceive of a higher value. Of course more reviews are on the positive end (Sometimes things don't live up to the expectation, hence the higher standard deviations)."

Fang-Yu Lin: "Now look at the average rating by user: There's still a bias toward higher scores for the same reason, but the bias is much less pronounced. The standard deviations are lower here too. This seems to suggest that many Amazon reviewers are rather evenhanded."

But what is really the point of all this, what is the crux of the matter...

Is the fundamental question really about the benefit or detriment of power reviewers on the user experience? After all, without them, most sites (including Amazon) would be left with a much shallower pool of user participation and feedback. Perhaps... but perhaps the question is more about how sites can best maintain usefulness and credibility in star-rated and other voting systems with power users in the mix. With around 70% of digital consumers relying on and trusting in other's opinions online, this is a problem that needs fixing fast.

Bridget McKinley
Fang-Yu Lin

TrackBack

TrackBack URL for this entry:
http://threeminds.organic.com/cgi-bin/movabletype/mt-tb.cgi/9465

Comments (2)

Dear Bridget and Lin

This is a really interesting post. Some would have suggested that explaining why the current fashion was going to end in our financial disaster would have been dangerous, especially if it then turned out that the crowd was right.

http://commetrics.com/articles/obama-spends-cash-for-trash/

The wisdom of crowds possibly herded by power users writing the first review of a product such as a computer game, e-reader or book is a real concern. In other words any attempt to turn mob opinion into the test for truth is pernicious.

A thought that a book might be literary work that one must read because it is highly ranked by many on Amazon does not make it true.

Accordingly, just because Galileo fell out of favour the earth did not stand still. Neither has evolution been shown to be untrue by the faith of believers.

Hence, product reviews driven by super-users and crowds following the 'leaders' just means that the wisdom of crowds can only be conventional. Volume against quality:

http://commetrics.com/articles/sociotwitting-developing-metrics-for-twitter-volume-vs-twitter-influence/

Thanks for sharing this:

Urs
@ComMetrics

Good topic for discussion. My quick thoughts include...

I think most would agree that any rating and review system will have its imperfections and potential irregularities. That said, how grand is it that we can have this discussion when 10+ years ago we had to rely almost exclusively on a small number of "professional reviewers" for insights and opinions!

Something I find interesting is what the stats of participation seem to be, and how they compare to analog stats in other areas. Where many more positive reviews seem to come in via online rating and review systems, we see the opposite offline. Where 10-20% of people generate 80-90% of the content online, offline it is an even greater divide (compare the number of legit authors and content creators to readers and content consumers).

When I see a movie review in the newspaper, I consider the quality of that reviewers past opinions and and can assess whether they jive with my personal tastes and sensibilities. Usually that can start to come through when you read more than one review by someone. That may sway me one way or another, but if I hear a so-so review and then another by a trusted source (e.g, a friend) I will be much more inclined to be influenced. Does this indicate we need to be able to review the reviewers as well. I would hope, although likely the same challenges will be raised yet again. But its always nice when users can be enabled to flag their "trusted reviewers".

IMHO, an imperfect system is better than none at all. And like most things, which scenario works bets will depend on the context. I personally find stars and comments on YouTube videos less relevant for me, but with Amazon books, I want to see the written reviews.

Post a comment


Type the characters you see in the picture above.