In October 2014, Elon Musk famously likened the use of AI to summoning a demon. He was no doubt referring to the existential threat of using technology that is unknowable: if it changes in ways that we can’t predict, how can we hope to control it? How can we guarantee that it will be safe to use?

Now, regulators are asking themselves the same questions. Recent years have seen global recommendations put in place for regulating AI, but it is very much an emerging field – many agencies are still in the research phase of developing clear and defined guidelines.

In January 2021, the FDA released an action plan describing how they mean to establish a regulatory framework for medical devices that use AI. While this is an exciting step forward, it’s only the beginning of the journey. The plan should be understood as a statement of intentions – a road map towards an enforceable set of rules. As such, the plan explains how the agency will proceed with investigating five key areas relating to software as a medical device (SaMD): creating a regulatory framework for devices that change over time, establishing best practices in AI/ML, fostering a patient-centred approach, managing algorithmic bias and robustness, and monitoring real-world performance of devices. It is informed by stakeholder feedback on a 2019 FDA discussion paper entitled, ‘Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning-Based Software as a Medical Device’.

The work that has already been done to assess and regulate the use of AI in medical devices gives some clues on how the action plan is likely to be used and refined. By considering how the legal, ethical and practical challenges identified could realistically be addressed, it’s possible to get a sense of what the eventual regulations will look like.

Safe changes

To foresee future patient risks, the FDA is proposing a framework whereby manufacturers identify and describe which aspects of their devices will change as they learn, and how their algorithms will remain safe and effective throughout. As originally laid out in the 2019 discussion paper, this ‘Predetermined Change Control Plan’ consists of “SaMD Pre-Specifications” identifying the aspects of a device that will be modified through machine learning and an “Algorithm Change Protocol”, which explains the methodology for implementing those changes without compromising safety or effectiveness. Even with this information from a manufacturer, it can still be difficult for a regulator to predict a device’s true risk over the course of its life cycle.

Continuous learning devices can be extremely sensitive to changes in training data, which can cause a big change in performance, explains Johan Ordish, group manager for medical device software and digital health at the MHRA in the UK. That creates extra challenges for regulators trying to classify a device, which the MHRA does based on its risk profile.

In its 2019 discussion paper, the FDA acknowledges that it may not be possible to predict how a device will change. Its proposed framework is intended as a process for using available evidence to identify as much future risk as possible at a premarket stage. In other words, it’s a way to make the best of what’s available.

While the MHRA currently has requirements aimed at managing a device’s risk across its life cycle – placing the responsibility on the manufacturer to ensure that the performance of their model is maintained over time – frameworks such as the FDA’s “predetermined change control plan” could help with the challenge of anticipating risk change in continuous learning devices, says Ordish.

Development of the framework is likely to be iterative and the FDA plans to issue draft guidance that will be open to public comment later in 2021.

Patients and practitioners

Manufacturers will also be required to provide information to users about the risks, benefits and limitations of their devices, per the FDA action plan’s commitment to building patient trust and transparency. However, a truly-patient centric approach must involve healthcare professionals. It is clinicians who obtain patient consent, which could be the deciding factor in whether a device is used at all. Clinicians are not mentioned within the plan.

The FDA does not regulate the practice of medicine nor the conduct of clinicians, but can create obligations that ensure they are as supported as possible when using AI medical devices.

“What the FDA can do is mandate some requirements and put [it] on the manufacturer, for example, to offer some educational programmes to physicians,” says Harvard research fellow in medicine, artificial intelligence, and law, Sara Gerke, whose research on regulating AI is referenced in the FDA’s action plan.

Clinicians could be trained to use AI in the same way they’re trained to give medication, says Dr Indra Joshi, director of AI for NHSX. While they may not know the ins and outs of its chemical properties, clinicians know what medication does and what to do if something goes wrong. “If you just take that same ethos and apply it to AI, [you can] say: ‘This is what you need to look out for and this is what you need to do’,” she explains. Communicating this could be as simple for manufacturers as putting in place a service-level agreement with the clinicians that prescribe and use their devices.

“What the FDA can do is mandate some requirements and put [it] on the manufacturer, for example, to offer some educational programmes to physicians.”

Sara Gerke, Harvard research fellow

Even so, manufacturers should be mindful that increased user transparency doesn’t come at the cost of usability or efficacy, notes Timo Minssen, director of the Centre for Advanced Studies in Biomedical Innovation Law at the University of Copenhagen. For example, a device that detailed each of its calculation steps and intermediate decisions to doctors could make them feel obliged to question each of those decisions. For Minssen, prompting humans to give approval or intervene at each stage largely defeats the purpose of using AI, slowing it down and undermining its effectiveness. “We need interfaces that enable users to better understand underlying computing processes,” suggests Minssen. “But if we take [transparency] too far then [the device] just becomes a tool.” Instead, an appropriate level of user understanding could be achieved by explaining key aspects of how a system works, including how it arrives at its decisions. “I think it would be desirable if the [device’s] assumptions and restrictions, the operating files, data properties, and output decisions can be systematically explained and validated,” says Minssen.

Algorithmic bias

Like the humans that build them, machine learning applications can carry many different types of bias. A coder could unintentionally transfer their world view into parts of the algorithm by having it solve problems in certain ways, or an algorithm could be trained on a data set that isn’t representative of the population for which it is intended. Bias can also be contextual, for example, if an algorithm were created to function within a certain setting and moved to another, such as from a specialist clinic to a rural hospital.

“The best we can do is try to mitigate biases as much as possible, by identifying the different buckets where bias can exist and then developing standards for […] what each AI manufacturer has to show [and to do] to mitigate this bias,” says Gerke. Regulators could require manufacturers to meet certain standards for data collection or to demonstrate how they have taken steps to minimise bias, for one. As well as looking at the data itself, manufacturers could also consider whether the team developing the algorithm might risk introducing bias, explains Minssen. This might happen if the device is intended for people who breastfeed but no-one on the development team has breasts. It’s important that awareness of potential bias is baked into the development of a device, in terms of both performance and patient safety. “People don’t see issues of bias as safety issues, they think of them as a kind of woolly social science question,” says Ordish. “There’s a direct and important safety issue about having representative data and ensuring that your model isn’t biased.”

The FDA’s plan to develop a methodology for identifying and eliminating bias is currently at the research stage.

Real-world performance

Who is responsible for a device’s safety as it moves from the lab to the doctor’s office and from there into a patient’s hand? Per the action plan, the FDA will run a pilot-monitoring programme that collects device performance data to inform an eventual framework for real-world monitoring. In a similar vein to patientcentricity, effective post-market monitoring needs to engage healthcare professionals as well as manufacturers, points out Prof Dr Christian Johner, director of the Johner Institute for Healthcare IT. “The system needs to be improved to systematically collect and assess the post-market surveillance data,” he says. “We [currently] rely on the manufacturer being honest enough to report any incidents, and the collection and assessment of field data reported by healthcare professionals is still not really working.”

Johner predicts that manufacturers will most likely be required to record usage data in future FDA monitoring regulations, in line with the broader industry shift towards relying on manufacturer data and evidence. Post-market monitoring should also include ongoing scrutiny of a device’s risk, he adds.

In the UK, patients can report issues with a device through the MHRA’s yellow card scheme. However, as this is not obligatory, nor done through the device itself – users must give feedback through a dedicated MHRA website – the regulator can only do so much with what is reported, says Ordish.

It may be a few years before there are clear regulations for AI devices, but it is a good practice for manufacturers to think critically about the ethical, practical and legal challenges of using AI in medical devices at the beginning of development, rather than trying to adjust the systems of a finished product.

It’s also essential not to lose sight of patient outcomes. “Many products may be safe and effective, but they don’t necessarily improve patient outcomes,” says Gerke. “We don’t necessarily need another AI that does not do any good in the long term.”


AI devices of the 130 approved by the FDA between January 2015 and December 2020 that were studied prospectively.


Studies, of the 130 AI devices approved by the FDA, which considered how the algorithm might perform across different patient groups. 

Nature Medicine

Evaluation needed for AI 

AI-powered medical devices approved by the FDA under its current framework lack critical data on performance and bias, according to research newly published in Nature Medicine. All but four of the 130 AI devices approved by the FDA between January 2015 and December 2020 were only studied retrospectively. Prospective studies are needed to evaluate a tool’s clinical use and interactions with hospital systems, which, particularly given the complexity of AI, are likely to differ from what developers can envisage.

The Stanford University research team also noted a lack of diversity in studies, with just 17 of the 130 considering how the algorithm might perform across different patient groups. Using their own AI models for triaging potential cases of pneumothorax, the authors found substantial drop-offs in performance when the tools were implemented at new sites, as well as disparities in results across different patient demographics. The report recommends that AI devices are evaluated in multiple sites, with prospective studies that compare them with the standard of care. Post-market surveillance is also necessary to identify and measure unintended outcomes and biases missed in trials.

Source: Nature Medicine