The Numbers Do Not Speak for Themselves

analytics change manager practitioner Mar 15, 2026

Written by Ed Cook and Roxanne Brown

There is a persistent and dangerous fiction in business: that numbers speak for themselves. They do not. Numbers sit there, inert, until a person with context and judgment converts them into meaning. A spreadsheet full of survey scores does not tell you whether your change is succeeding. A correlation coefficient does not tell you whether to act. A trend line does not tell you what caused the trend. These are calculations. They are the raw material of insight, not insight itself. The crucial conversion from calculation to meaning is interpretation, and it is entirely a human act.

In our last article, If You Can Conceive of It, You Can Calculate It, we explored the remarkable capability jump in Large Language Models and how they have made sophisticated analysis accessible to anyone who can ask a question in plain language. That capability is real and transformative. But it has also created a new risk. The speed and polish of an LLM's output can make an analytical result feel authoritative when it is merely fast. The confidence of the presentation can mask the fragility of the conclusion. To put it directly: a beautifully formatted wrong answer is still wrong.

The third step in our Data-Driven Change Management process is to Analyze the Data. If you have done the work of the first two steps well (formed a specific question or hypothesis, chosen metrics that drive decisions, collected data at the right time), then you are well-positioned to analyze. The question now is not whether you can use the tools to produce an answer. You can. The greater need is to determine what that answer actually means.

The Interpreter's Advantage

Consider a physician reading a blood panel. The numbers are precise. The reference ranges are printed right on the page. A layperson could look at the results and see that a value is flagged as high or low. But the physician brings something those numbers alone cannot provide. They know the patient's history, their medications, what they ate the night before, and how they are feeling right now as they sit in the examination room. The same cholesterol number means something very different in a 28-year-old marathon runner and a 62-year-old with a family history of heart disease. The number is identical. The interpretation is not.

Change metrics work the same way. An LLM can calculate that your training completion rate dropped 15% between Wave 1 and Wave 2 of a rollout. It can even flag the decline as statistically significant. What it cannot tell you is that Wave 2 coincided with the company's busiest season, that the training platform had an outage for two days, or that a well-respected team leader openly questioned the value of the training in a town hall. Those contextual factors live in the minds of the people closest to the change. They are not in the dataset. The interpreter's advantage is not in calculation, but rather in knowing what surrounds the calculation.

This is why the human role in analysis is not diminished by AI. It is elevated. The computational burden continues to shift to the machine, freeing the change leader to do what only a human can: bring the context that transforms a number into a decision. This has been the flow from calculators to spreadsheets (and now) to Artificial Intelligence.

Correlation Is Not a Conclusion

One of the most common analytical traps, made more common by how easy LLMs make it to find patterns, is overweighting a correlation. Ask an LLM to find relationships in your change data, and it will find them. Every time. It is built to find patterns. Some of those patterns will be meaningful. Some will be a coincidence. The LLM does not know the difference. You do.

Here is a practical example. Suppose you upload your change metrics and ask the LLM: "What factors are most strongly correlated with successful adoption?" The model might return that departments with higher meeting frequency also have higher adoption scores. A tempting conclusion: “more meetings drive adoption.” But anyone who has spent time in an organization knows that the relationship could run in the opposite direction. Departments that are already engaged in the change may simply meet more often because they have more to discuss. Or both variables could be driven by a third factor, perhaps those departments have leaders who are more involved, and the involvement shows up in both meeting frequency and adoption. The correlation is real. The causal story is not yet established.

The discipline required here is not computational. It is intellectual. When an LLM surfaces a pattern, the change leader's job is to ask: "Does this make sense given what I know about the organization?" That question cannot be outsourced. It requires the accumulated knowledge of someone who understands the culture, the politics, the history, and the personalities at play. It requires Human Intelligence.

The Narrative Fallacy in Change Management

Nassim Nicholas Taleb, in The Black Swan, described what he called the Narrative Fallacy: our tendency to construct a coherent story around random or loosely connected events and then believe the story is the truth. Humans are storytelling creatures. We are drawn to explanations that feel complete and satisfying, even when the evidence beneath them is thin.

Change management is particularly vulnerable to this fallacy. Consider the widely repeated claim that "70% of change initiatives fail." This number has been cited for decades, repeated in presentations, embedded in training materials, and used to justify consulting engagements. Yet when researchers trace the claim back to its origins, the evidence base is remarkably weak. The statistic persists, not because it is well-supported, but because it tells a compelling story, one that confirms what many people already believe about the difficulty of change. It sounds true, and that is enough for it to survive.

The same dynamic can occur in your own change analysis. An LLM produces a result that aligns with what you expected. It confirms your hypothesis. The temptation is to accept it and move on. But confirmation is not the same as proof. A well-formed hypothesis, as we discussed earlier in this series, is falsifiable. That means you should be looking not only for evidence that supports it but also for evidence that challenges it. If the data only confirms what you already believed, ask the LLM a harder question: "What in this data contradicts my hypothesis?" or "What alternative explanations could account for these results?" The model will find those, too. Your job is to weigh them honestly.

This is not about doubting every result. It is about applying the rigor that makes analysis trustworthy. A leader who presents findings that have been tested against alternatives is far more credible than one who presents findings that merely confirm a predetermined narrative. The former is analysis. The latter is advocacy adorned with data.

Three Questions for Every Analytical Result

When an LLM returns an analytical result, whether it is a correlation, a trend, a cluster, or a comparison, the change leader should ask three questions before acting on it.

First: "Does this make sense?" This is the context question. Given everything you know about the organization, the stakeholders, and the timeline, does the result align with observable reality? If the model tells you that a department with the most vocal opposition to the change also has the highest adoption scores, do not dismiss it. Investigate it. The answer may reveal something important about the difference between vocal concern and actual behavior.

Second: "What else could explain this?" This is the alternative explanation question. For every pattern the LLM surfaces, there are usually multiple possible causes. The discipline of generating alternatives protects you from locking onto the first plausible story. In academic research, this is the practice of considering rival hypotheses. In practical change management, it is the habit of asking "what else?" before drawing a conclusion.

Third: "If I act on this, what happens?" This is the decision question, and it connects directly to the standard we have used throughout this series. Useful data is data upon which you would make a decision. If the analytical result does not change what you would do, it may be interesting, but it is not actionable. The analysis has not yet reached the threshold of information. Keep going.

These three questions do not require statistical expertise. They require judgment, curiosity, and the willingness to be wrong. They are the interpreter's toolkit.

The Three Questions in Practice

To see how these questions work together, return to the Wave 2 training example. You have uploaded your change metrics into an LLM and asked: "Given that Wave 2 training completion dropped 15% compared to Wave 1, what factors in this dataset most strongly predict whether a team completed training on time?" The model returns a clear result: teams whose managers attended the kick-off session completed training at nearly double the rate of teams whose managers did not. The correlation is strong.

"Does this make sense?" You know that manager attendance at kick-off sessions varies for many reasons. Some were traveling. Some had conflicts with quarterly reviews. But you also know, from your Observable metrics, that several managers in Wave 2 were openly skeptical about the new process. One of them said as much in a town hall. The LLM's finding aligns with something you have seen but not yet measured, the signal that managers’ visible commitment matters more than the training content itself.

"What else could explain this?" You push the model: "What other variables differ between teams with high and low completion?" It surfaces a second factor. Teams with low completion are disproportionately in the operations division, which was in the middle of its peak season. Now you have two plausible explanations. Manager commitment and operational load. They may both be true. But the operations teams whose managers attended the kick-off still completed at a higher rate than those whose managers did not, even during peak season. The timing contributed, but the manager's visible involvement carried more weight.

"If I act on this, what happens?" Here is where the analysis converts to a decision. Wave 3 is two months away. If you require manager attendance at kick-off sessions, you address the strongest predictor of completion. If you also adjust the training schedule to avoid peak operational periods, you address the secondary factor. Both are actions you can take. Both are grounded in data that has been tested against alternatives. And critically, neither is a mandate from a spreadsheet. They are decisions made by a leader who understands what the numbers mean in context.

Notice what happened through this process. The LLM did the calculating. It surfaced the correlation between manager attendance and training completion, and it identified the competing explanation of operational timing. Those would have taken a human analyst longer to isolate from a complex dataset. But the LLM did not know that a respected leader challenged the training publicly. It did not know that the operations teams were in peak season. It could not weigh the practical difference between requiring manager attendance, which costs leadership capital, and adjusting a training schedule, which costs coordination effort. Those judgments required a person who knows the organization. The LLM provided the raw material. The interpreter converted it into a plan.

Triangulation: The Interpreter's Real Advantage

The Wave 2 example illustrates something important about how interpretation actually works. The LLM analyzed one type of data, the training completion scores and associated variables that lived in the spreadsheet. But the insight that unlocked the analysis came from a different source entirely, the observable behavior of a manager who challenged the training in front of the organization.

In our courses and consulting, we advocate a three-tiered approach to change metrics. Self-Reported data captures what people say about the change: their survey responses, confidence levels, and stated readiness. Observable data captures what others can see: behaviors in meetings, participation patterns, and the body language of influential leaders. Existing Company Metrics capture what the organization's systems already track: help desk volume, error rates, productivity measures, and attrition.

No single tier tells the full story. Each one is a partial view. The power is in the balance between them. When Self-Reported data says people feel ready, but Observable data shows managers disengaging and Existing Company Metrics show error rates climbing, that disagreement is the insight. It tells you something no individual metric could reveal on its own: the change is progressing on the surface but not taking root underneath.

An LLM can analyze each tier with remarkable speed. What it cannot do is hold all three in mind simultaneously and ask, "What is the story these three sources are telling when I read them together?" That is the interpreter's real advantage. Not the ability to calculate, but the ability to triangulate.

The LLM as Analytical Partner

None of this means the LLM is unreliable. It means the LLM is a tool, and like every tool, its value depends on how it is wielded. A well-prompted LLM is an extraordinary analytical partner. It can process datasets that would take a human analyst weeks. It can surface patterns that no one on the team would have thought to look for. It can run sensitivity analyses, generate alternative models, and visualize results in minutes. These are genuine capabilities, and they represent a significant advance for anyone working in data-driven change.

The key is in the prompting. A vague prompt produces a vague result. A specific, well-formed prompt produces a result that is far more likely to be useful. This echoes what we said at the start of this series: Start with a Question or Hypothesis. That principle applies not only to the overall analysis but to each individual interaction with the LLM. Instead of "analyze this data," try "Given that Stakeholder sentiment in town hall meetings seems to be souring on the change, what factors in the data might explain this?" The second prompt gives the model direction. It gives the model a destination.

This is where the previous steps in the Data-Driven Change process pay their dividends. If you started with a specific question, chose metrics that matter, and collected data at the right time, then your prompts to the LLM will be naturally focused. The quality of the analysis is downstream of the quality of the preparation. There are no shortcuts here, only a process that rewards discipline at every step.

When to Trust and When to Verify

A practical question emerges from all of this: when do you trust the LLM's output, and when do you verify? The answer is not "always verify everything," that would defeat the purpose of using the tool. Nor is it "always trust the output," that would be negligent. The answer is calibrated trust.

For descriptive statistics (means, medians, distributions), the LLM is highly reliable. These are straightforward calculations, and modern models handle them accurately. Trust them, but spot-check occasionally by asking the model to show its work or by calculating a few values yourself.

For correlations and comparisons, apply the three questions above. The calculation is likely correct. The interpretation is where errors creep in. Use the LLM to generate the numbers, then use your judgment to determine what those numbers mean in context.

For causal claims or recommendations, exercise the most scrutiny. If the LLM says "Department X should increase meeting frequency to improve adoption," recognize that the model has crossed from calculation into advice. That advice is based on patterns in data, not on an understanding of your organization. Test it against what you know. Discuss it with your team. It may be directionally correct, but the specific recommendation needs the filter of human experience before it becomes a plan.

This graduated approach respects both the power of the tool and the irreplaceable value of the person using it.

Analysis as an Act of Respect

There is a dimension to all of this that goes beyond methodology. When a change leader takes the time to interpret data carefully, to check the results, consider alternatives, and ensure that the conclusions are sound before presenting them, that leader is doing something important for the people affected by the change. They are treating the decision with the seriousness it deserves. They are refusing to take shortcuts with information that will shape other people's work lives.

This is an expression of one of our Core Four Philosophies:

Leading change intentionally is simply a gesture of respect.

Rushing from calculation to conclusion is not intentional. It is reactive. Taking the time to interpret is the analytical equivalent of the care we have advocated for in every step of this process. It respects the data, it respects the people the data represents, and it respects the decisions that will follow.

When you do this well, something else happens. The people around you begin to trust the process. They see that the data is being handled with care, that it is not being used to confirm a predetermined story, and that their contributions to the data (their survey responses, their time, and their candor) are being honored. That trust is not a byproduct of good analysis. It is one of its most important products. And it is a foundation upon which Joy at Work can grow.