By Mark Watts
Large language models (LLM) have the potential to transform health care. But when will this transformation occur? In my experience working in a large health care organization, there’s a clear caution against the use of LLMs for tasks like translation: “Never use AI for translation!”
Currently, most LLMs are released globally without country-specific iterations, requiring regulators to adopt a global approach. However, it remains unclear which technical category LLMs will fall into from a regulatory perspective. Given the differences between LLMs and earlier deep learning methods, there may be a need for a new regulatory category to address the unique challenges and risks posed by LLMs.
Regulators may need to design regulations for LLMs if developers claim that their models can be used for medical purposes or if LLMs are explicitly developed or adapted for such uses. While many widespread LLMs might not meet these criteria, medical-specific LLMs trained on health care data likely will.
One notable example is Med-PaLM, a model developed by DeepMind and Google researchers. Their study introduced a framework for evaluating model outputs along various axes, including factuality, precision, potential harm and bias. Utilizing innovative prompting strategies, Med-PaLM achieved 67.6% accuracy on the U.S. Medical License Exam questions, surpassing previous benchmarks. Although it shows promise, it still falls short compared to human clinicians. In contrast, GPT-4 has since achieved over 85% accuracy on the same exam.
With GPT-4’s capabilities expanding to analyze not just text but also images, it’s expected that future models will be able to process documents, research papers, handwritten notes, audio and video.
Regulatory discussions around LLMs, although not limited to health care, have begun to surface. A working paper by Hacker et al. proposes a new terminology to describe the AI value chain, differentiating among developers, deployers, professional users and recipients of LLM outputs. The authors suggest four key strategies to ensure the trustworthy deployment of LLMs: focusing on high-risk applications rather than pre-trained models, and implementing (i) transparency obligations, (ii) risk management protocols, (iii) non-discrimination provisions and (iv) content moderation rules.
Existing auditing procedures, however, fail to adequately address the governance challenges posed by LLMs. To bridge this gap, I propose three contributions: 1) establishing the need for new auditing procedures tailored to LLM risks; 2) outlining a blueprint for auditing LLMs based on best practices from IT governance and systems engineering; and 3) discussing the inherent limitations of auditing LLMs.
These potential solutions could serve as benchmarks for new health care regulations. As the dynamics of LLMs evolve rapidly, it is crucial for regulators and lawmakers to act swiftly.
The urgency of regulatory action was underscored in March 2023 when a group of prominent figures, including Elon Musk and Steve Wozniak, called for a six-month pause on the training of AI systems more powerful than GPT-4. Their letter expressed concern over an “out-of-control race” in AI development, urging for a public and verifiable pause to ensure safety.
However, AI experts like Andrew Ng argued against a moratorium, emphasizing the need to balance the immense value AI brings against realistic risks. He pointed out that a government-imposed pause could hinder competition and innovation.
Additionally, it’s worth noting that Italy became the first Western nation to temporarily block ChatGPT in April 2023 due to privacy concerns and regulatory gaps.
While LLMs hold tremendous promise for the future of health care, their use also presents risks and ethical dilemmas. By adopting a proactive regulatory approach, we can harness the potential of AI-driven technologies like LLMs while minimizing harm and maintaining trust among patients and health care providers.
Moreover, LLMs could pioneer a new category of AI-based medical technologies regulated through patient-centered design. This would involve patients in the highest levels of decision-making, ensuring that these rapidly evolving AI tools address real clinical and patient needs.
Regulation cannot simply focus on existing LLM models; it must also consider future iterations that are likely to be implemented at a similar pace. Without this foresight, regulations targeting only current models may miss significant updates as they become available. Companies that have already obtained FDA approval for their medical technologies will face additional challenges when integrating LLMs, as they must navigate how new AI components will be regulated.
Here I summarize what we can expect regulators to do about bringing LLMs to the practice of medicine.
- Create a new regulatory category for LLMs as those are distinctively different from AI-based medical technologies that have gone through regulation already.
- Provide regulatory guidance for companies and health care organizations about how they can deploy LLMs into their existing products and services.
- Create a regulatory framework that not only covers text-based interactions but possible future iterations such as analyzing sound or video.
- Provide a framework for making a distinction between LLMs specifically trained on medical data and LLMs trained for non-medical purposes.
- Similar to the FDA’s Digital Health Pre-Cert Program, it regulates companies developing LLMs instead of regulating every single LLM iteration.
– Mark Watts is an experienced imaging professional who founded an AI company called Zenlike.ai.


