Most health scores are confidence theater. The team doesn't trust them, the CFO doesn't defend them, and they don't predict the thing they exist to predict. A practical, post-mortem-driven approach to building one your CS team will trust.
How most health scores get built
Someone at your company (usually a CS Ops lead inside a CS platform implementation) decides you need a health score. They open a blank canvas. They pick a few signals that feel right: login frequency, feature adoption, ticket volume, NPS. They weight them, sometimes with input from CSMs. They roll the score out. They paint it green/yellow/red.
Nobody asks the central question: does this thing actually predict renewal?
The result is a score that backtests at about 55–60% accuracy, barely better than coin flip. A green account is almost as likely to churn as a red one. The CSMs notice within a quarter and stop trusting it. The CS leader still ships it on the QBR slide deck, but in practice nobody acts on it.
That's confidence theater. It's worse than no health score, because it gives the org false comfort.
What a health score actually has to do
A useful health score has three jobs:
- Rank accounts by churn probability with materially better accuracy than no model.
- Be specific enough about why an account is at risk that the CSM can act.
- Be trustable. The CSM has to believe the model when it disagrees with their gut, and the model has to update fast enough that a fix actually moves the score.
If your health score does (1) but not (2), CSMs will dismiss it as a black box and trust their own judgement instead. If it does (1) and (2) but not (3), the model will be ignored the first time a CSM saves an account the model called red. All three legs matter.
The post-mortem method
The single most useful exercise we run when building or rebuilding a health score is what we call the cohort post-mortem.
You take the last twelve months of churned and contracted accounts (not renewed). For each one, you walk backwards through their lifecycle (usually 90 days before the renewal decision) and you ask: what signal, if we had been looking for it, would have told us this was coming?
You do this for 20–40 accounts. Then you do the same for 20–40 retained accounts in the same segment.
Patterns emerge fast. Almost every time:
- Multi-user activation matters more than login frequency. Solo logins from a single power user are a leading indicator of internal politics, not adoption.
- Executive engagement (is the buyer / sponsor still in the QBR?) is the single strongest leading signal we see, period.
- Adoption depth (using 3+ core features) beats adoption breadth.
- The slope of the trend usually beats the snapshot. An account at 70% usage trending down for 90 days is more at-risk than an account at 50% usage trending up.
- Ticket sentiment (escalations, exec-cc'd tickets) matters; raw ticket volume mostly doesn't.
The signals will vary by your product, segment, and motion. But the exercise (walking backwards through cohorts and asking what would have predicted) is what produces a model that actually works.
Build it backtest-first
After the cohort post-mortem, build the candidate model (five to eight signals, weighted, scored) and backtest it against the same twelve months before you ship it.
The backtest tells you:
- What's the actual accuracy of this model? (Aim for 75%+ on the churn cohort.)
- What's the recall? (How many of the churned accounts did the model flag red?)
- What's the false-positive rate? (How many green accounts churned? How many red ones renewed?)
If the backtest is bad, the model is bad. Don't ship it. Iterate the weights or swap the signals and backtest again.
This sounds like a lot of work. It's three or four days of analyst time. Compared to shipping a health score that nobody trusts and quietly killing it eighteen months later (which is what the alternative looks like) it's almost nothing.
Operationalize, don't just visualize
A health score that lives in a dashboard does nothing. The score has to drive a motion:
- Red → CSM mobilizes a defined intervention playbook within 5 business days.
- Yellow → CSM logs a documented observation in the next account note.
- Green → CSM identifies expansion / advocacy opportunity in next QBR.
If your score isn't wired into a defined motion, you have a visualization, not a score. The motion is the thing that converts the model into a retention dollar.
A test you can run this week
Pull the last twelve months of churned accounts in your CRM. For each one, look at what your current health score said 90 days before the renewal date.
If more than 30% of them were green, your score isn't a score. It's a feeling, badly disguised as a number.
That test takes an analyst three hours. The result tells you whether you have a real instrument or theater.
If the answer is theater, we can help you build one that isn't. We've done it eleven times. It's not magic. It's the post-mortem method, applied seriously.