Artificial intelligence (AI) is a set of technologies that will profoundly impact healthcare in all dimensions of quality.
At times, however, the AI industry and the community of healthcare quality specialists talk past each other. This is because they operate with different epistemologies – the term for descriptions about what we treat as knowledge, how much we trust it, and how we respond when it turns out to be wrong.

Image: Gemini
Truth versus learning
Much of the AI industry is built on the idea of ground truth i.e., that there’s a correct answer, that with enough data you can approximate it, and that model performance improves as predictions get closer to that truth over time. From this perspective, using AI language, error is deviation that must be corrected, drift is decay that must be stopped and bias is something to eliminate.
In healthcare however, ground truth is often no more than what a clinician recorded in the notes. That judgement is affected by context and can easily change. This connects to the improver’s view that healthcare data reflects evolving practice rather than some eternally fixed external reality.
Improvement science, influenced by the foundational ideas of W. Edwards Deming (Figure 1), asserts that conditions change, systems evolve and therefore knowledge is always provisional.
A helpful way to think about this is to picture a healthcare system as a river in constant motion. The measurements we collect – performance metrics, outcome data, model predictions – are like samples taken from the flowing water. Each sample is informative, but none captures the river itself. The value comes from understanding the direction, speed, and change in the flow over time, not from treating any single measurement as a fixed truth.

Image: OpenAI
The aim is to make better predictions under current conditions, and to learn quickly and act appropriately as the river – and the system – changes.

Figure 1: W Edwards Deming’s Model of Profound Knowledge
This difference of perspective is of practical importance. To an AI team, a sepsis prediction model that degrades over time looks like “model drift”. To an improver on the other hand, it may be evidence that workflows, staffing, patient mix or treatment thresholds have changed, and that the care system and its processes need attention.
Drift as a defect or a useful signal
In the world of an AI engineer, drift is a technical problem which requires retraining the model, adding features or refreshing the data. In improvement work, drift is often the first signal of a learning opportunity. Retraining the model without understanding the cause conceals a useful signal.
Healthcare systems are full of feedback loops. Introducing an AI tool changes clinician behaviour, which changes data, which feeds back into the model. Treating drift purely as a technical failure misses the feedback loop and the opportunity to improve the system itself. Treating it as a system signal encourages everyone to ask better questions about what changed, why behaviour shifted, and what unintended effects were created.
Imagine a scheduling AI that starts predicting more “no-shows” for a specific clinic. The AI engineer might retrain the model to accept this new reality. An improver would visit the clinic and realise that the bus route has changed, making it harder for patients to arrive on time. The AI highlights the symptom; the improver finds the cause.

Image: AI highlights the symptom; the improver finds the cause | Gemini
The practical implication is simple but also rather demanding: monitoring model performance needs to be paired with ongoing system learning. In healthcare, models and the systems around them evolve together.
Teams who have built impactful predictive models are often as concerned with what happens after a prediction is made as with the quality of the model itself. How a prediction is introduced into a workflow, for example, whether it becomes part of a team huddle, a case review, or a shared planning conversation, largely determines its value. Many model builders recognise this and are looking to improvers for help in designing workflows that support learning and improvement.
Hallucination and mental models
The AI industry tends to frame hallucinations as ungrounded output – a failure mode that needs to be suppressed. Improvers will be concerned about the clinical risk but may also see something slightly different. An unexpected result, whether produced by a model or a human, can reveal the structure of the underlying mental model.
In clinical practice, unexpected outputs can be useful precisely because they force reflection. Why did the model suggest this diagnosis? What assumptions is it making? Used carefully, anomalies become tools for learning.

Image: Gemini
Humans in the loop – or humans as the loop
The phrase “human in the loop” is common in AI governance. It implies that experts supervise, correct or override the model. Improvement thinking takes a stronger position: humans are the loop. They are not supervisory backstops but the primary agents of learning.
The purpose of AI in healthcare is to help clinicians, teams and organisations learn quicker and more reliably. If an AI tool produces correct outputs but erodes clinical understanding, situational awareness or professional judgment, it has failed, even if the model metrics look good.
This has practical consequences for design. Automating “non-value-added” tasks is sensible. Also, design choices should preserve or enhance opportunities for sense-making, reflection and learning. We want AI to be a co-pilot rather than a chauffeur.
Bias, variation, and context
The AI industry often treats bias as systematic error to be reduced by adding data. Improvers should be wary of this framing. In healthcare, what looks like bias may reflect real differences in context, resources, or patient populations.
Deming’s distinction between common and special cause variation still applies. Before adjusting a model to eliminate bias, it is often more important to understand the variation, and what it is teaching about the system.
The practical implication is restraint. Not every difference should be smoothed away, and common cause variation is an acceptable feature of a stable system. Other differences – special cause variation – point to where improvement may be needed, or has resulted.
The best AI practitioners, especially when teamed with clinicians and improvement specialists, realise that models are provisional. So the different world views they hold are best understood as a “clash of default settings” rather than a war between tribes.
A different role for AI
AI is a magnifying glass that increases visibility of patterns, variation, and unexpected results. Decisions about what matters are for humans, grounded in values, purpose, context and responsibility.
AI can accelerate improvement, but only if it is embedded in a learning system that expects change, values professional judgment and treats knowledge as something to be continually revised.
Acknowledgement
The conceptual frameworks in this essay and extension of Deming’s System of Profound Knowledge to AI-enabled systems originate from the 2025 IHI Forum session “Beyond Automation: Creating AI that Supports Learning” presented by Vladimir Manuel, Brandon Shelton and Jane Taylor. Any errors of interpretation are mine. Dr Manuel has an upcoming paper with additional co-authors, including Moira Inkelas.