Why NPS is a bad signal for product decisions

Picture a founder on a Monday morning. Her B2B SaaS product has around 400 users. Last quarter she sent her first NPS survey; 38 people answered, and the score came back at 42. She checks the benchmark sites, sees that 42 looks strong for SaaS, screenshots the number for her investor update, and moves on to the roadmap with a quiet sense of relief. Users are happy. The product is working.

Three weeks later, two of her ten largest accounts churn.

Neither of them answered the survey.

This is the trouble with Net Promoter Score. It is not that the number is always wrong. It is that the number arrives looking like an answer when it is, at best, an unopened question. For a founder making product decisions — what to build, what to fix, who to build for — NPS is one of the weakest signals you can collect.

It measures a claimed intention from a self-selected sample, compresses it into a lossy score, and tells you almost nothing about why people behave the way they do.

Where the one number came from

NPS was introduced in 2003, when Bain consultant Fred Reichheld published The One Number You Need to Grow in Harvard Business Review. The idea was elegant: ask customers one question — how likely are you to recommend us to a friend or colleague? — on a 0–10 scale. Call the 9s and 10s promoters, the 0–6s detractors, subtract one percentage from the other, and you have a single number that, Reichheld claimed, predicts company growth.

Executives loved it, for understandable reasons. It is cheap to measure, easy to track, and it compresses something as messy as customer feeling into one comparable figure. Its spread is easy to see: Nielsen Norman Group’s 2024 NPS overview treats the recommend-us prompt as a common web encounter and analyses NPS as a standard customer-loyalty metric. That adoption context is not evidence that the metric is good.

From there the metric trickled down. Today, founders with a few hundred users run NPS surveys because it feels like the responsible, grown-up thing to do. But the original claim — that this number predicts growth — deserved far more scrutiny than it got.

The evidence never held up

Here is something many NPS users never learn: the studies behind the 2003 article were not published in full with public data. Itamar Gilad summarises the history clearly: the pivotal studies were not subjected to normal peer-review scrutiny.

When independent researchers tried to replicate the finding, the superiority claim weakened. In 2007, Timothy Keiningham and colleagues published a longitudinal study in the Journal of Marketing using data from 21 firms and more than 15,500 customer interviews. Working with industries Reichheld had held up as exemplars, they failed to replicate the clear superiority of NPS over other measures. Plain customer satisfaction — the unfashionable metric NPS was meant to replace — often predicted growth just as well or better.

Jeff Sauro of MeasuringU, who is notably fair-minded about NPS, reviewed the published evidence and found a subtler problem: in many studies claiming NPS predicts growth, the score was correlated with historical or near-term revenue, not future growth. Successful companies have customers who recommend them. That tells you much less about whether the score can see forward.

Jared Spool was blunter in his widely shared essay Net Promoter Score Considered Harmful: NPS does not give management a reliable read on loyalty, growth, or what to fix.

For product teams, that last clause is the point. Even if NPS were a decent board-level health check, it would still be a poor roadmap input.

The maths throws away your answers

Set the growth claims aside and look at what the calculation does to the answers people give you.

Eleven response options get collapsed into three bins:

9–10: promoters
7–8: passives
0–6: detractors

Then the passives disappear. A 0 counts exactly the same as a 6. A 9 counts the same as a 10.

Nielsen Norman Group gives a vivid example of what this destroys. Suppose you ship an improvement and your most unhappy users move from scoring you 2 to scoring you 5. That is a real change. People who hated the product now merely dislike it, and their reasons have probably changed too. Your NPS records nothing. They are still detractors. You made the product better and the metric shrugged.

The binning has a second cost that bites small companies hardest: throwing away information means you need much larger samples before changes in the score mean anything. NN/g notes that researchers must drastically increase sample sizes to get statistically relevant information out of NPS. If you have 400 users and 38 responses, a quarter-on-quarter move from 42 to 31 may simply be noise. The founder who reorganises her roadmap around that drop is reacting to static.

And the score moves for reasons that have nothing to do with your product. A Qualtrics XM Institute study of 17,509 consumers across 18 countries found large cross-country differences in how people use the scale, even for companies they like. Same affection, different scoring culture. If your user base shifts from one geography to another, your score can move while nothing about the product changes.

Who answers — and who quietly does not

This is the part of the story closest to Maren, because it is not a maths problem. It is a candour problem.

An NPS score is built from the people who chose to answer. Rob Markey — Reichheld’s co-author and one of the architects of the Net Promoter System — has acknowledged the bias openly: promoters are often more likely to respond, while detractors are less likely. Bain’s own guidance says response rates below 40% for consumer businesses and below 60% for B2B are a warning sign for reliability.

Now look honestly at a startup survey. If your in-app NPS prompt gets 10% of users to answer, it is far below the reliability threshold NPS advocates themselves describe for B2B. If it gets 5%, the problem is sharper. The users drifting towards churn — the ones whose stories you most need to hear — are often the ones dismissing the popup.

That asymmetry matters. The survey can systematically over-sample fans and return a flattering number built on the silence of everyone else. Our founder with a score of 42 did not learn that her users were happy. She learned that her responders were happy.

Run the arithmetic on her situation. 400 users, 38 responses: a 9.5% response rate. If the quiet accounts losing interest had answered alongside the fans, the same product on the same day might have scored very differently. Neither score would have named a single thing worth fixing.

Then there is gaming. Because NPS is simple and visible, it attracts manipulation the way all high-stakes single numbers do. NN/g points to Campbell’s Law: the more a metric matters in decision-making, the more it gets manipulated. Support reps mention the survey only after calls that went well. Car dealers plead for a ten. Teams learn to improve the score instead of improving the experience.

At enterprise scale, this corrupts dashboards. At startup scale, it can be as subtle as the founder personally emailing the survey to the users she knows will be generous.

Saying is not doing

Underneath everything, NPS measures a claim about a hypothetical: how likely are you to recommend us, someday, to someone? Decades of research on stated intentions tells us to be careful. What people predict they will do and what they actually do are different things. We have written before about the gap between observed and reported behaviour, and recommendation is one of the clearest cases.

In 2007, V. Kumar and colleagues studied actual referral behaviour at a telecom firm and a financial-services firm and published the results in HBR as How Valuable Is Word of Mouth?. Their finding was uncomfortable for NPS users: high-purchasing customers who said they would recommend often did not actually refer.

So the chain of inference behind an NPS-driven product decision runs like this: a biased sample of your users makes an unreliable prediction about a hypothetical future action, the answers are compressed into three bins that erase most of their information, and the resulting number is compared against benchmarks that vary by industry, country, and survey channel.

Then someone in a roadmap meeting says, “NPS dipped this quarter, we should do something about onboarding.”

Even the inventor moved on

Perhaps the strongest evidence against using NPS the way many companies use it comes from Fred Reichheld himself.

In 2021 — eighteen years after the original HBR article — he co-wrote Net Promoter 3.0, arguing for a new headline metric called earned growth rate. The important shift is that earned growth is built from audited accounting data: revenue from returning customers and revenue from customers they actually refer. In other words, behaviour.

Gilad also quotes Reichheld in a later interview making the point more plainly: if a customer comes back, gives more business, and refers others, that is a promoter; you do not need a survey to tell you.

That is the lesson founders should take. Behaviour beats the survey.

What to use instead

If you are a founder with somewhere between 50 and 1,000 users, here is the honest position: you probably do not have enough respondents for NPS to be statistically meaningful, the respondents you get may skew towards fans, and the number cannot tell you what to build anyway. Three things will serve you better.

Watch what users do. Retention, activation, frequency of use, expansion, support volume, invite completion, project creation — these are records of behaviour, not predictions. They are harder to game and impossible to be polite to. They will not explain themselves, but they will tell you where to look.

If you want one survey question, ask a sharper one. When Rahul Vohra was searching for product/market fit at Superhuman, he skipped NPS and used Sean Ellis’s question instead: how would you feel if you could no longer use the product? The First Round Review write-up is useful because the score was not the point. The follow-ups were: who benefits most, what is the main benefit, what holds you back? Those answers wrote the roadmap.

Mostly, though: talk to your users. A score, even a good one, cannot tell you why. The founder whose enterprise accounts churned did not need 38 ratings; she needed eight honest conversations — a few with people who had gone quiet, a few with people who scored her a 7, a few who never answered at all.

What does a 7 mean? It means “there is something I am not telling you.” The only way to find out is to ask, follow up, and listen to the story behind the number:

Walk me through the last time the product let you down.
What were you trying to get done?
What did you do next?
Who else was involved?
What workaround did you use?

Those questions arrive with context attached. Ten of those conversations will hand you a sharper roadmap than a thousand ratings, because each one tells you not just that something is wrong but what, for whom, and how much it matters.

That is the research that moves a roadmap: themes, incidents, and reasons, gathered from the people the survey never reached.

A score is a question, not an answer

None of this means you must never measure sentiment. Tracked humbly, over time, alongside behavioural data, an NPS trend can flag that something shifted. Treat it like a smoke alarm, not a diagnosis.

The failure mode is treating it as a decision-making input on its own: celebrating a 42, panicking at a 31, reshuffling a roadmap because a biased sample of users moved three points on a question about a hypothetical.

Your users are not numbers between 0 and 10. They are people with stories about workarounds at 11pm, about the export that failed before a board meeting, about why they nearly left in March and what made them stay.

NPS compresses all of that into a single integer and discards the middle. The real signal was never in the score. It was in the conversations you did not have.