All Motion Is Not Progress: Why Your Metrics Are Measuring the Wrong Thing
A customer emails you. They want proof of ROI before their renewal. Specifically, they want page views. Login frequency. Session duration. The usage report pulled straight from your platform telemetry.
You have that data. It’s easy to pull. And something about sending it still feels wrong.
It should.
What that customer is asking for, and what most Customer Success teams have been trained to provide, is a measurement of activity dressed up as a measurement of value. Those two things are not the same, and the gap between them is where churn quietly builds.
The Proxy Metric Fallacy
In operations and systems design, there’s a documented failure mode called surrogation, also called the Proxy Metric Fallacy. It occurs when an organization substitutes a highly measurable activity for a difficult-to-measure outcome. The metric is easier to collect. It correlates loosely with the real thing. And gradually, it replaces the real thing entirely.
It occurs when an organization substitutes a highly measurable activity for a difficult-to-measure outcome.
In Customer Success, this is everywhere. Telemetry data (logins, session duration, feature clicks, page views) is cheap, automatic, and objective. It requires no qualitative effort, no difficult stakeholder conversations, no multi-variable attribution. It flows from the product into a dashboard and shows up green.
True business value is harder. “Process accelerated by 30 hours per week.” “Error rate reduced by 45%.” “Revenue protected that would have churned without intervention.” These numbers require work to find: joint definition with the customer, baseline measurement, progress tracking over time. They don’t appear on their own.
So teams default to what’s available. The proxy becomes the measure. The measure becomes the target. And then Goodhart’s Law kicks in.
Goodhart’s Law and the Collapse of the Metric
The economist Charles Goodhart formulated what would become one of the most reliable observations in systems thinking: when a measure becomes a target, it ceases to be a good measure.
This isn’t a hypothesis. It’s a behavioural law. The moment you tie performance evaluations, health scores, renewal confidence, or perceived customer value to a proxy metric, two things happen:
First, the humans inside the system optimize to satisfy the metric rather than the underlying goal. CSMs focus conversations on increasing login frequency rather than solving problems. Product teams build features that maximize time-in-app rather than time-to-outcome.
Second, the metric decouples from reality. The number goes up. The actual value delivered stays flat or declines. And when the renewal conversation arrives, you’re holding a dashboard that says “healthy” while the customer’s CFO is building an internal case to cut the spend.
The phrase “all motion is not progress” is the operational version of Goodhart’s Law. It separates kinetic energy (activity, busyness, movement) from directional velocity, which requires a destination. You can have enormous motion and go nowhere.
The Efficiency Paradox: Good Software Should Make You Use It Less
Here is the argument that most CS leaders avoid making out loud, because it inverts everything their health score model is built on:
If your platform is genuinely effective, the customer should spend less time in it to achieve the same outcome.
B2B software is purchased to make a process more efficient. The goal is acceleration: more output, fewer errors, faster cycles. A platform that’s working correctly automates the repetitive work, surfaces the right information instantly, and eliminates the manual steps that used to take hours.
That means high session duration (a metric almost every health score weights positively) could be evidence of a clunky user interface, poor training, workflow friction, or an overly manual process that the platform hasn’t actually solved. The customer spending 45 minutes a day in your product might be doing so because it takes 45 minutes to do what it should take 10.
The gym membership analogy makes this precise. Imagine your gym tracked your ROI using only the facility telemetry: key fob swipes and minutes on premises. You visited 20 times last month. Average session was 45 minutes. The gym manager sends you a report and calls it proof of value.
It proves nothing. You could have spent every one of those 45 minutes sitting in the lobby. The actual value of the gym is a lower resting heart rate, increased strength, a personal best on a deadlift. The fob swipes and the session duration are necessary operational inputs. You cannot get the outcome without them. But they are not the outcome.
A highly focused 20-minute workout delivers more value than 90 minutes of wandering between machines. Less motion, more progress.
The same is true in your platform.
The Token Usage Trap in AI
This isn’t only a Customer Success problem. The exact architectural flaw is appearing at speed inside AI tooling and automation.
“Token usage” has become the operational equivalent of “time in app” for AI workflows. The assumption is intuitive: if a workflow is consuming more tokens, it must be doing more valuable work. High usage signals high engagement signals high value.
It doesn’t.
Token usage measures computational expenditure, the raw material cost of a query. It says nothing about the quality of the output, the accuracy of the result, or the operational improvement delivered.
A poorly constructed AI workflow that consumes 50,000 tokens to extract three data points from a document is not delivering more value than a well-architected system that achieves the same extraction in 2,000 tokens. The low-usage system is superior by every meaningful measure: it delivers equivalent output with less cost, less latency, and less failure surface. The high-token workflow is wasting resources, not creating them.
When teams begin evaluating their AI investment by token consumption, they replicate every mistake from the adoption-metric era of Customer Success. They reward computational sprawl over operational efficiency. They mistake the fuel bill for the miles traveled.
They mistake the fuel bill for the miles traveled.
The Second-Order Effects
Follow the proxy metric logic to its conclusion and you find severe downstream damage.
If product teams are evaluated on time-in-app, they are incentivized to build complex multi-step workflows rather than streamlined automated solutions. Feature bloat becomes a rational strategy. The UX degrades because friction keeps people inside the product longer. You end up with a platform that is expensive to navigate because navigability would hurt the metric.
In health scoring, the damage is different but worse. A customer with high usage metrics might be red-lining the system because their internal processes are broken. They’re logging in constantly because they can’t find what they need, or they’re desperately building workarounds to compensate for a workflow the platform hasn’t solved. A CSM who looks at that dashboard, sees green, and marks the account healthy has completely missed the churn signal. The warning was there. It was disguised as a positive indicator.
False positives in health scoring are more dangerous than false negatives. A false negative gets caught during renewal preparation. A false positive gets ignored until it’s a churn you didn’t see coming.
What to Measure Instead
The corrected framework is not complicated. It has one requirement: you need to know, for every account, what “better” actually means to that customer in specific, measurable terms.
Not “they’re satisfied.” Not “engagement is strong.” A number. Process time reduced by X hours per week. Error rate down by Y percent. Revenue per rep increased by Z.
From that North Star Metric, work backward. Identify the 2–3 workflows inside your platform that are directly connected to moving that number. Track completion, not initiation. A workflow that starts and stalls is not adoption; it’s friction you haven’t fixed yet. Then identify the input metrics that lead the North Star: the numbers that move before the outcome does, that give you early signal before the renewal conversation.
Telemetry data (logins, session time, feature breadth) lives in this model as supporting context. It tells you whether the customer is positioned to achieve the outcome. It is a prerequisite, not a measurement.
The customers who renew confidently are not the ones with the highest login counts. They are the ones who can articulate, in their own language and to their own leadership, what the platform did for their business. Your job is to make sure that story exists, and that you helped build the evidence for it.
They are the ones who can articulate, in their own language and to their own leadership, what the platform did for their business.
Page views don’t do that. A lower resting heart rate does.
Stop measuring fob swipes. Start measuring the deadlift.

