Health Systems Action

We need somewhere to test AI

Years ago, my hospital in the US was considering buying an anaesthesia information system. This was a major purchase in the 1-2 million dollar range.

The system would automatically record vital signs, document anaesthesia care and become part of the daily workflow of every operating room (OR) in a large academic medical center.

During the sales process I asked the vendor a simple question: “Could we try it in one operating room first?”

Just one OR. A short trial. We’d run the system for a few weeks, see how it worked in practice and decide whether to roll it out more widely.

“No,” said the vendor. “Impossible. We never do that.”

Joseph Wright of Derby – An Experiment on a Bird in the Air Pump (1768).

In the end, we bought the system anyway. It was expensive and disruptive to install. Once clinicians started using it, other challenges appeared: workflow worsened, devices weren’t fully integrated, billing was still on paper, promised efficiencies and safety gains never materialised.

Looking back, the problem was obvious; we had installed a complex clinical technology across the entire system without ever testing it properly in the environment in which it would be used.

In healthcare we’ve repeated this mistake many times.

For example, the rollout of Electronic Health Record (EHR) systems rapidly “digitised” us but added alert fatigue and clerical burdens that contributed significantly to clinician burnout.

We’re doing it again with artificial intelligence.

Epic’s sepsis-prediction AI is a case in point. Validated on clean, retrospective data, it was rolled out to multiple hospitals but performed poorly in those real world settings and was ultimately withdrawn.

The problem with artificial intelligence in healthcare today

AI is rapidly entering clinical practice, assisting with imaging interpretation, clinical documentation, administrative processes, clinical prediction and decision-making.

But AI tools are typically developed and validated in environments that differ from their actual setting of use. Algorithms are trained on historical datasets, evaluated retrospectively, then offered as finished products.

What happens when they enter the real world?

Pieter Bruegel the Elder – The Tower of Babel (1563). Complex systems are easier to build than to control.

In many cases, we simply don’t know.

A range of responses is possible. Sometimes, clinicians ignore them. Others may end up over-relying on them. Outputs and performance are likely to change when systems encounter new populations and messy, real-world data.

The big question for a hospital leadership team: can the whole system – data, technology, clinicians, workflow – operate safely and effectively – in our place?

At the moment, we lack mechanisms to systematically  assess this before deployment.

The case for a healthcare AI sandbox

One promising solution is something called a regulatory sandbox [UK/Commonwealth: sandpit].

In the financial world, regulators realised that new digital products were evolving faster than traditional regulatory processes. They created controlled environments where companies could test innovations under supervision.

A prime example: the UK Financial Conduct Authority’s regulatory sandbox, launched in 2016, allows firms to test innovative products in a controlled, live environment.

Can the same idea work in healthcare?

Singapore’s new Artificial Intelligence in Healthcare Guidelines (AIHGle 2.0) encourage controlled testing environments including regulatory sandboxes for evaluating new AI systems.

The idea is instead of immediately installing a new application across an entire hospital system, evaluate and refine it in a controlled setting; assess safety risks, operational impact and allow adjustment.

A real healthcare AI sandbox

Versions of this approach are already in play.

The American state of Utah has created an AI regulatory sandbox. One of the first projects involves a company called Doctronic which uses AI to automate parts of the routine prescription renewal process.

Renewing medications for chronic conditions, such as blood pressure medicines or statins, is surprisingly slow and bureaucratic. It requires contacting a doctor’s office, waiting for approval and coordinating with pharmacies.

Under the Utah sandbox program, the AI system can handle certain routine renewals without human intervention. The patient’s identity is verified, their prescription history is reviewed, a short clinical “interview” is conducted, and the AI decides whether the medication is renewed or escalated to a human clinician. 

Safety rules are defined, with physician oversight and ongoing reporting to regulators. Only medications that have already been prescribed by a licensed clinician can be renewed. High-risk drugs and controlled substances are excluded.

Whether this AI ultimately succeeds or fails is almost beside the point. The important thing is that it’s being studied systematically first.

Goals of the healthcare sandbox

A healthcare AI sandbox would focus on a few specific activities.

First, technical validation across diverse datasets to understand performance, bias and robustness.

Second, clinical evaluation in real workflows to see how clinicians interact with the system.

Third, operational impact: whether workload is reduced and quality improves or whether bottlenecks and other unintended effects emerge.

Finally, independent evidence that healthcare organisations and regulators can trust is generated and distributed.

Building sandboxes

Healthcare systems around the world are under pressure to adopt AI. Vendors are enthusiastic, investors optimistic and policymakers see enormous potential. But responsible innovation requires a measured approach.

A healthcare AI sandbox could be created by a consortium of hospitals, insurers, universities and technology companies.

Costs could be covered through a mix of public innovation grants and private “testing fees” from vendors.

Strict “privacy-by-design” will be needed; synthetic datasets may be useful.

Evidence about how AI tools work in practice will make adoption decisions easier for healthcare organisations while helping to avert costly and harmful mistakes.

Before we deploy AI everywhere, shouldn’t we try it first in one “operating room”?

References

  1. Financial Conduct Authority, UK. Regulatory sandbox lessons learned report, 2017. https://www.fca.org.uk/publications/research/regulatory-sandbox-lessons-learned-report
  2. Ministry of Health Singapore; Health Sciences Authority. Artificial Intelligence in Healthcare Guidelines (AIHGle 2.0). March 2026. 
  3. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. Nature Medicine. 2019;25:44–56. 
  4. Utah Department of Commerce. News release: Utah and Doctronic announce groundbreaking partnership for AI prescription medication renewals. 6 January 2026. 
  5. Utah Office of Artificial Intelligence Policy. Doctronic AI Mitigation Agreement. 
  6. U.S. Food and Drug Administration. Good Machine Learning Practice for Medical Device Development: Guiding Principles. 
  7. Sheppard Mullin. Testing the boundaries of artificial intelligence in care delivery: Utah’s prescription renewal pilot program. 

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top