Health Systems Action

Rethinking Patient-Reported Outcomes in the age of generative AI

Patient-reported outcomes (PROs) capture how people experience illness, treatment and day-to-day health. As artificial intelligence (AI) evolves, generative AI tools offer new ways of gathering and interpreting this information. A recent paper (Boyer et al, npj Digital Medicine. 2025) highlights both the limitations of current approaches and the possible contribution of AI. These ideas are promising, but many are untested and will need careful evaluation.

What current PROs capture, and what they miss

Most modern patient-reported outcome measures (PROMs) still resemble the structured questionnaires developed for clinical trials in the mid-20th century. Later advances, such as item-response theory (IRT) and computer adaptive testing (CAT), improved efficiency, but the underlying format is fixed: defined symptom categories, numerical scales and standard scoring.

This structure is useful. PROMs offer comparability between patients and over time, and in many areas (for example depression screening with PHQ-9) they perform reliably. But they also simplify complex experiences. A single “fatigue” score might reflect medication effects, poor sleep, low mood, caregiving stress or social pressures. Because nuance is inevitably lost, clinicians often perceive PROMs as administrative tools rather than clinically rich sources of information.

Structural and equity gaps

The limitations are not only in the questionnaires themselves. In many settings, PRO data is still poorly integrated into routine care, and often lives in parallel systems separate from medical records. Participation is uneven: people with lower literacy, limited digital access, or language barriers are less likely to be represented. These issues are relevant globally and in South Africa, where multilingualism and variable access can easily distort who is “heard”.

How generative AI could help

Generative AI, particularly large language models (LLMs), could broaden and enrich PRO collection in several ways. Some capabilities are already feasible, others are experimental.

Near-term possibilities include:

  • Conversational data collection. AI systems could gather information through short voice or text dialogues, asking relevant follow-up questions, translating automatically and reducing literacy demands.
  • Text summarisation and theme extraction. LLMs can analyse open-ended responses and produce short, structured summaries that give context to standard PROM scores.
  • Making narrative data easier to use. Free-text descriptions are difficult to interpret at scale. AI tools could highlight common themes such as medication side-effects, transportation problems or emotional distress.

Longer-term or experimental possibilities include:

  • “Bottom-up” modelling. Rather than starting from predefined constructs (e.g., “pain severity”), models could infer patterns directly from patient narratives. For example, a model might detect links between sleep disruption, mobility limits and medication effects without being instructed to group these items together. This is conceptually attractive but requires new evidence and validation.

Potential applications

If carefully developed, AI-supported PROs could contribute in a range of clinical and public-health areas.

  • Public health programmes. Conversational tools in local languages could capture feedback on chronic disease treatment adherence – e.g. HIV, TB, hypertension. Aggregated summaries might highlight common obstacles such as transport costs or side-effect concerns.
  • Rehabilitation and home-based care. Narrative check-ins could help monitor post-surgical pain, sleep or mobility, giving physiotherapists and home-care teams earlier insight into recovery challenges.
  • Mental health screening. AI-generated summaries may help primary-care teams recognise changes in mood or distress signals, especially where mental health capacity is limited.
  • Routine service quality. Aggregated PRO themes could support district-level service reviews, complementing traditional performance indicators.
  • Social and policy planning. Narrative data may reveal links between housing, transport, workload or safety and people’s health experiences, informing broader interventions.
  • Private-sector and insurance use. Where digital infrastructure exists, PROs are already used to track outcomes under value-based or chronic-disease management programmes. AI tools could augment these systems with narrative context.

Two development paths

The Nature paper describe two possible ways forward:

  • Incremental improvement. LLMs are used to enhance traditional PROMs, providing clearer explanations to patients, generating summaries for clinicians, or enabling multilingual delivery, while preserving familiar scoring systems. This is achievable in the short term.
  • Full redesign. PROMs become open-ended conversations interpreted by AI models. Scores or constructs emerge from patterns in the narrative, rather than predefined items. This approach is conceptually appealing but raises major questions about reliability, interpretability and fairness.

Integration, validation and safety

PROMs have accumulated decades of psychometric testing. AI-enhanced methods would require new forms of evaluation:

  • Robustness: Do models behave predictably across different prompts, updates and versions?
  • Accuracy: Do summaries reliably reflect what patients actually said?
  • Clinical validation: Do AI-derived insights add value to decision-making?

Multilingual settings bring added complexity. Models must be tested for consistency across languages – e.g., English, isiXhosa, Zulu, and Afrikaans –  and must correctly interpret idioms, cultural expressions and indirect communication styles. Speech transcripts may contain incidental identifiers, raising questions about storage, retention and privacy.

Workflow burden is another concern. Even well-designed AI tools can become unusable if they lengthen already busy consultations. South Africa’s fragmented digital landscape means that simple formats – short summaries that can be added to existing electronic or paper records – may be more feasible than fully integrated systems.

Governance and fairness will be essential. Questions include who owns the processed data, how secondary uses are explained to participants, how model drift is monitored, and how biases are identified and corrected. Local Health Research Ethics Committees, along with POPIA compliance, will play an important role in overseeing these issues.

Economic considerations matter too. AI adoption brings costs additional to licensing, including integration, training, monitoring and governance. In some settings, investing in basic PRO infrastructure may offer more immediate gains.

Conclusion

As AI reshapes healthcare, the outcomes that matter most to patients risk being overlooked. Generative AI may help strengthen the patient voice by capturing narrative detail that traditional PROMs cannot. But progress will depend on careful evidence-building, transparent governance and pragmatic integration into real clinical workflows.

Small, well-defined pilot projects, especially multilingual ones, can test feasibility, reliability and acceptability. PROMs are unlikely to be replaced, but they may be enriched in ways that make them more clinically meaningful and more reflective of the realities of people’s lives. Without this groundwork, AI-enabled PROs will remain an interesting but unrealised idea.

References

Boyer E et al. Reimagining patient-reported outcomes in the age of generative AI. npj Digital Medicine. 2025. https://www.nature.com/articles/s41746-025-02006-1

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top