Why Is It So Hard to Rate Headphones?August 8, 2019
“When SoundStage! founder Doug Schneider and I created SoundStage! Solo, we decided to try some ideas that other SoundStage! Network sites had never explored. We expanded the publishing schedule, added a comments section to the articles, and — at Doug’s suggestion, and against my hesitation — introduced ratings. Why did Doug’s suggestion worry me? Because I’d asked the “ratings or no ratings?” question so many times since I started as a tech journalist in 1989, and I’d never really come up with a satisfactory answer.
Many, many headphones
As an editor or writer, I’ve worked with (and against) ratings systems at Video magazine, Home Theater magazine, etown.com, Sound & Vision, Wirecutter, About.com, and Home Theater Review. And those are just the ones I can remember; there may have been others. Every one of them was conceived by well-meaning editors who sought to make product comparisons simpler for readers. But every one of them created problems.
What problems could befall a ratings system? Lots. For example:
- Making sure all the ratings square up: a product that gets an 8 is always better than a product that gets a 7, even if the two were not reviewed by the same person.
- The “reviewer’s fallacy”: the tendency of reviewers to give above-average ratings to average products.
- Accounting for performance improvements: the 10 from two years ago might be an 8 today.
- Not using the whole scale: ratings on 5-point scales tend to cluster around 4, and ratings on 10-point scales cluster around 7 and 8.
- Alienating readers who think the product they bought deserves a 9, but you gave it a 7.
- Alienating manufacturers, who consider any rating below 4 on a 5-point scale, or 8 on a 10-point scale, to be the kiss of death from a marketing standpoint.
- Varying perceptions of what the scale means: On a 10-point scale, is a 5 average? Or is it like a grading scale in school, where 75% (or 7.5 out of 10) is average?
Some of the problems of using ratings scales were recently highlighted in “A Survey and Analysis of Consumer and Professional Headphones Based on Their Objective and Subjective Performances,” a paper by Sean Olive, Omid Khonsaripour, and Todd Welti (all of Harman Research), delivered at the October 2018 Audio Engineering Society convention in New York City. In the paper, the scientists compared the results of their research-based headphone evaluations against the subjective ratings of five different publications. “The average correlation between our objective headphone sound quality ratings and those reported by five different headphone review organizations was disappointing[ly] weak (r = 0.5),” they said. Note that a correlation of 1 would be perfect. “Agreement varied from 0.17 (PC Magazine) to as high as 0.75 (Consumer Reports).”
I expect some audio writers will protest that they don’t care if their ratings conform to those of a scientific research program, but what’s important to realize here is that the publications’ ratings don’t conform to each other, either. Nor, in many cases, can an outside observer identify any clear trends within a single publication’s ratings, or even a single reviewer’s ratings. The less structured the review program is, the greater the likelihood that products with widely differing sonic characteristics will get the same rating — or that very similar products will get different ratings.
Ever since we launched the ratings on SoundStage! Solo, we’ve been debating what they mean and what kinds of products should get what ratings. We realized it’s time to come up with a clear explanation of what our ratings mean, so that readers, manufacturers, and SoundStage! editors and writers can understand them clearly.
So going forward, here’s the scale we’ll use for our Sound ratings. Note that this is an absolute scale; we won’t be elevating a product’s performance score because its price is low.
- Dysfunctional, can’t be said to fulfill its stated purpose
- Poor, barely functions as a product
- Weak, not useless, but hard to imagine who would buy it
- Significantly flawed, but may have a few attributes that make it worth a look
- Average, but a good buy only for certain tastes/needs
- Average, but still worth considering
- Good, some flaws, but still a good buy for most people
- Solid performance with a couple of notable flaws; a good buy all-around
- Truly outstanding, with a minor flaw or two
- Among the very best you can buy, not necessarily perfect but very close
For our Value ratings, we’ll have a similar scale:
- A total, obvious rip-off no one on Earth should buy
- Ridiculously overpriced; no way to make a case that it’s a good deal
- Overpriced for what it offers; hard to think of a reason why someone would buy it
- Not such a good buy unless it has some feature you especially want/need
- An OK buy, but you can definitely find a better value
- A fair deal; you’re getting what you paid for but nothing more
- A good deal, better performance/features than you’d expect for the price
- A great deal, although it’s possible you could find a better one
- A fantastic bargain for what it offers
- An unbelievable steal
Just to make sure everyone understands these new scales, we’ll be adding them to all future reviews on SoundStage! Solo. Meanwhile, if something’s not clear about the way we’re rating products going forward — or if you think you might have a better idea! — please let us know in the comments section below.”