Health Systems Action

AI doesn’t follow up

My previous article [1] discussed the fact that large language models lack a formal model of the physical world in which care takes place. This is a problem for AI in healthcare. Healthcare may be described in language but health care, disease, and its treatment are not linguistic phenomena. They are part of the physical world, governed by the laws of physics, chemistry and so on.

A second problem is that even if AI gets better at modelling the clinical world, it still has to “stay with” the patient over time. That’s because health care is rarely a single episode and a single decision. It is a sequence of decisions, actions, reactions and revisions. A patient receives a diagnosis, perhaps provisionally, starts a treatment, responds or does not, misses a dose, changes a goal, develops a side effect, or simply does not come back for a second visit. The care plan only works if someone notices what happens next.

This is where AI, as deployed today, is weak. It can answer, summarise and suggest an action, but it does not reliably follow up.

A new paper by Lin and colleagues [2] is built around this point and how AI agents fit in. The authors argue that most health AI agents – newer AI systems that are designed to take action, not only produce text – are still reactive and episodic. They handle each interaction as a separate query rather than supporting goals, addressing unresolved concerns and maintaining accountability during repeated encounters.

To address this problem, the authors propose a simple framework for “longitudinal health agents” built around four capabilities: coherence, continuity, adaptation and agency.

These are technical terms, but the idea is simple. Good care means making a sensible recommendation, remembering why it was made, checking whether it helped, changing course when circumstances change, and being clear about who is responsible for the next step.

Think about three ordinary patients.

Albert has heart failure and has just been discharged from hospital. His instructions are clear: medicines, diet advice, weight monitoring and follow-up. At home, the detailed plan turns out to be hard to execute. He misses doses, gains weight and starts to feel more breathless. Each small change in his condition is manageable on its own. Together, they start to point toward another admission.

Penny has endometriosis. Over months she tries pain medication, hormonal treatment, heat, exercise, diet changes and workarounds for bad days. Some strategies help, some do not. The issues for Penny and her doctor are to establish what pattern has developed, keep track of what has already been tried, and what should be reconsidered.

George has bipolar disorder, mostly depression. His sleep, medication, routine, therapy and extent of social support all affect his condition but it’s variable. What helps one week may not help the next. His risk seems to build up gradually through poor sleep, social withdrawal, missed medication or loss of everyday life structure.

These examples are clinically varied but the follow-up challenge is similar; the care of these patients depends on memory, interpretation, timing and responsibility.

Not only an AI problem

Health systems fail at this every day: test results aren’t always followed up, and treatment trials aren’t fully evaluated. Symptoms may be documented then forgotten. Responsibility is passed between hospital, specialist, primary care, family and patient until it is less clear who is watching over things and taking responsibility.

Primary care, at its best, is designed to prevent this muddle through continuity, whole-person understanding, follow-up and attention to context. Many health systems underinvest in exactly these functions. AI is therefore entering a weak part of medicine. If it simply mimics and automates current processes, it might make care more “efficient” without making it safer or more connected.

The four capabilities [2] make the gaps in care visible and highlight the opportunities for improvement:

Coherence means maintaining a stable, structured understanding of the patient’s story. It is more than storing past chats or keeping a long transcript. A coherent system knows which facts are settled, which ideas are hypotheses, what has been tried, and how symptoms, actions and outcomes may be connected. Without coherence, the patient returns a month later and the system effectively starts again.

Continuity means carrying goals and unresolved issues forward. If a symptom was important last time, it should be revisited. If a medicine was started as a trial, someone should ask whether it helped. If weight gain after heart failure discharge is a warning sign, it should not sit in a log as an isolated data point.

Adaptation means revising the plan as circumstances change. The patient may improve, deteriorate, lose support, change priorities, or respond differently from expected. Guidelines and treatment options also change. A useful AI system should update its assumptions when new information makes the old story less plausible.

Agency means being explicit about who acts, when and under whose authority. An AI tool can remind, suggest, nudge, escalate, draft a message or contact a care team if permitted. Each action has different implications. Too little initiative leaves the patient alone with risk. Too much initiative can undermine autonomy, confuse responsibility or create unsafe dependence. A longitudinal agent has to negotiate that balance over time.

Figure 1. Care is a feedback loop. Most current AI systems support one part of the loop; longitudinal care requires staying with the loop over time.

Picture the care loop. The patient generates data: symptoms, measurements, behaviours, test results and observations. Someone interprets the data. A decision is made. An action follows. Then there is an outcome. That outcome should feed back into the next round of interpretation and action.

Most current AI tools are in one part of this loop. They help interpret information, produce text, answer a question or suggest a decision. Longitudinal care requires staying with the whole loop.

Follow-up failures

Follow-up failures are often silent; they can happen even if the discharge summary was accurate, the advice sensible, the chatbot answer reasonable. The problem only shows up later, when no one checks whether the plan happened or whether conditions changed.

The same point applies to AI. A system that gives a good piece of advice can still be unsafe if it loses the thread. A system that remembers everything can still be unhelpful if it can’t distinguish durable facts from tentative assumptions. A system that identifies risk can still create harm if responsibility for doing something is unclear.

Figure 2. A language model can answer. A world model helps situate the answer. A care model is needed to follow goals, unresolved issues and responsibility over time.

The next generation of health AI will need a world model [3] and also a care model: a representation of the patient’s evolving situation, the care plan, the open questions, the expected trajectory, the important data signals, and the responsibilities of patient, clinician and system.

AI designers should not treat follow-up as a reminder feature or a memory feature. Follow-up is part of the work of care.

For Albert, a useful agent would not only explain his heart failure. It would track the discharge plan, notice weight and symptom patterns, check medication use, and know when to involve the care team.

For Penny, it would not only list endometriosis options. It would help connect symptoms, context, treatments and goals over months, while supporting her own judgement and preparing better clinical conversations.

For George, it would not only offer mental health advice. It would recognise changes in sleep, routine, mood and risk; adjust the intensity of support; and help clarify when other help is needed.

Today’s AI does not meet this high standard but today’s health care often fails to meet it too. AI could eventually provide sustained attention over time. It could keep unresolved issues visible, make plans easier to revisit and support patients between formal encounters. It could also create new risks if it acts without authority, stores sensitive information without trustworthy governance, or turns uncertain interpretations into persistent false beliefs.

Evaluation of health AI should therefore include this question: did it it stay with the problem? Until we can say yes, much of what is important in health care will remain out of reach.

References

1. Kantor G. Blame the world, not the model. Health Systems Action. 2026.

2. Lin GB, Jiang R, Elhadad N, Xu X. A longitudinal health agent framework. arXiv. 2026. arXiv:2604.12019.

3. Safavi-Naini SAA, Meftah E, Mohess J, et al. Grounding clinical AI competency in human cognition through the Clinical World Model and Skill-Mix Framework. arXiv. 2026. arXiv:2604.08226.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top