Health Systems Action

World models and word models: layers of the AI cake

Development of artificial intelligence (AI) is moving along two distinct paths. The last decade has been centred on large language models (LLM) that generate fluent text by predicting the next word in a sequence. A newer approach focuses on building world models – systems that learn how the physical world behaves and changes over time.

In France, AMI Labs, led by pioneering AI researcher Yann LeCun, is pursuing this second path. LeCun argues that advanced machine intelligence (AMI) systems must learn through interaction, build internal representations, retain memory, and plan actions based on expected consequences.

AMI Labs has attracted attention and investment despite having no commercial product to date; reported multi-billion dollar valuations reflect the belief that world-model approaches will be vital for future high-reliability AI systems.

From generation to simulation; implications for healthcare

Robotics is an obvious application for world models. Robots must reason about balance, physical contact and trajectories in real time. Healthcare is another domain where world models will be important because clinical decisions similarly involve predicting how systems  –  biological ones – evolve in response to intervention.

Language models already create value in healthcare by summarising notes, drafting discharge letters and transcribing consultations. But these language-centred tasks differ in nature from the deeper reasoning required for clinical diagnosis and treatment planning.

Clinical reasoning uses probabilities, and clinicians continuously update likelihood estimates as new evidence arrives. At the same time, decision-making is grounded in how physiology changes over time and in response to treatment. Diagnosis and treatment planning draw on causal mechanisms as well as probabilistic pattern recognition.

The layer cake

LeCun has reinforced this view with a simple metaphor: a multi-layered cake.

Yann LeCun’s layer cake representation of Advanced Machine Intelligence.

At the base of the cake are perception and action – i.e., sensing the world and acting within it. Above this sit world models that learn how the environment behaves over time. Memory and planning build on those. Language comes later, near the top: powerful and expressive, and dependent on the layers underneath. Generative AI, in this framing, is the icing, not the structure that holds everything up.

This hierarchy mirrors how humans learn. We do not understand the world by reading descriptions of it. We learn by acting, observing consequences and updating our expectations. Reading the rules of the road does not make someone a safe driver; skill develops through steering, braking, judging distance and experiencing outcomes.

This is a vital distinction because medicine is not inherently a linguistic domain. Disease is governed and constrained by biology, chemistry and physics, and evolves over time. Clinical decisions depend on how a living system responds to intervention, informed by, but not limited to, textual descriptions. Reasoning about physiological change requires the modelling of how biological systems behave.

Figure: A layered view of artificial intelligence in healthcare. Perception and action form the foundation, connecting the system to the physical world through sensors, signals and interventions. World models learn how biological systems behave and evolve over time, such as cardiac conduction, tumour growth, or haemodynamics. Memory provides longitudinal context, allowing past admissions, treatments and responses to inform current decisions. Planning and simulation use these layers to evaluate counterfactuals, compare treatment options and anticipate risk. Language models sit at the top, supporting communication, explanation and documentation. Clinical decisions depend on the lower layers that model physiological change resulting from intervention. In high-reliability domains, safe systems must behave in stable and auditable ways across all layers.

The JEPA architecture

At AMI Labs, the world model approach is formalised as Joint-Embedding Predictive Architectures (JEPA). JEPA models focus on predicting abstract future states instead of generating text token by token or images pixel by pixel in the way that language models do. When the wind blows, the model does not try to predict the motion of every leaf but predicts that the tree will sway.

This is close to how clinicians already think. A doctor may recognise that a patient is deteriorating without forecasting every future value of heart rate or blood pressure. Biology is noisy and unpredictable at fine levels of detail, but structured underneath. Exact trajectories are uncertain, but plausible futures are constrained. Clinical reasoning operates in this space, accepting uncertainty at the detailed level but developing reliable expectations based on underlying mechanisms.

Healthcare data fits this paradigm. World models are designed to learn from the data sequences that each episode of care provides – a state (the patient’s condition), an action (a treatment), and a new state (the outcome).

Real-world implications

High-reliability domains such as aviation and nuclear power actively design variability out of critical processes. By embedding physical and biological constraints, world models can support more predictable behaviour suited to clinical settings that require high reliability.

For example, research suggests that simulation-based models may be useful in oncology, where tumour evolution as a result of treatment can be modelled. In cardiology, ECG-based systems learn the physics of cardiac conduction alongside surface signal patterns.

The French Connection

AMI Labs has partnered with Nabla, a company best known for clinician-facing generative tools that reduce documentation burden. The longer-term ambition is to combine fluent interfaces with deterministic, auditable clinical decision engines.           

Both AMI Labs and Nabla are based in Paris, whose AI ecosystem combines deep roots in fundamental machine learning with a growing health-tech sector. 

The digital twin

A “digital twin” is a unified model combining physiology, imaging, longitudinal data and interventions. In principle, such a system could simulate the effects of a treatment in an individual before it is prescribed.

For example, a beta-blocker prescription: a world model-based simulator could anticipate reduced heart rate, detect a dangerous blood-pressure drop, and flag the risk before treatment begins.

Language models still play an important role as interfaces, translating between clinicians and underlying models. In this arrangement, as in the layer cake, decision logic resides in systems that model physiological consequence while language models support interaction and explanation.

Conclusion

Increasing the model size, training data and computational power of large language models has been the most reliable way to improve their performance to date. However this approach is expensive and delivering diminishing returns, with persistent concerns around reliability, explainability and reproducibility.

The valuations attached to AMI Labs are a bet on the future architecture of artificial intelligence. A likely outcome is a clearer separation of roles: language systems serving as interfaces, translating between clinicians and machines, with world models providing the reasoning core.

In healthcare, where decisions impact on living persons and outcomes are often irreversible, the move from AI systems that work by inferring patterns in data to those that model causal consequences appears a necessary design choice.

1 thought on “World models and word models: layers of the AI cake”

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top