Keep the box? How Human-Centered Deployment Prevents Errors

By Mark Watts

Early in a career as a system leader you are asked to make decisions that seem like a 50/50 bet at best. I was in the process of crafting a corporate imaging data retention guideline for a 19-hospitals system. The question was: “Do we save the CAD overlay box on mammogram studies or not?” I said, “No, let the radiologists see them in the moment but delete them from permanent storage.” What would you have done?

For the better part of a decade, the field of radiology has been abuzz with the promise of artificial intelligence. We’ve seen presentations, read the early papers, and marveled at the algorithms that can, in controlled settings, outperform human experts at specific, narrow tasks. The vision is seductive: an AI co-pilot that never tires, flagging subtle nodules on a chest CT, quantifying white matter hyperintensities on a brain MRI with perfect reproducibility, and prioritizing the acute stroke case on a worklist of hundreds of studies. This is the future we were promised – a future of augmented intelligence, reduced burnout and superhuman accuracy.

Yet, in the quiet, low-lit reading rooms where the real work of diagnosis happens, a different conversation is taking place. It’s a conversation laced with a healthy dose of professional skepticism, born from decades of experience. The central question is not “Can AI be accurate?” We know it can. The more pressing, more nuanced question is: “What happens when it’s wrong?”

An AI model is not a seasoned colleague with whom one can debate the finer points of a case. It is, for now, largely a black box. It presents its findings with an unblinking, uncalibrated confidence. And in that confident assertion lies a hidden danger – the potential for cognitive capture, for the human expert to be swayed by a fallible machine. This is not a failure of the radiologist, but a fundamental challenge of human-computer interaction.

The true frontier of AI in radiology is not just building more accurate algorithms, but designing systems that gracefully account for their own inevitable imperfections. It is a human factors problem. A recent study1 has shed a powerful light on this very issue, offering compelling evidence that how we present AI findings to radiologists is just as important as the findings themselves. The study reveals that simple changes to the user interface and workflow can either amplify or mitigate the influence of an incorrect AI, with profound implications for patient safety.

I want to unpack the findings of this pivotal research and explore what it means for our profession. This isn’t just an academic exercise; it’s a roadmap for responsibly integrating these powerful tools into our clinical lives.

The Siren Song of Automation Bias

Before we dive into the study’s solutions, we must first intimately understand the problem. The core psychological phenomenon at play is automation bias. This is the well-documented tendency for humans to over-rely on automated systems, often trusting their output more than their own judgment. We see it in our daily lives: the driver who follows their GPS into a lake, the writer who accepts a grammatically correct but nonsensical suggestion from their word processor.

In the high-stakes environment of a radiology reading room, the conditions are ripe for this bias to take hold. A radiologist may interpret dozens, sometimes hundreds, of complex studies in a single day. The pressure for speed and efficiency is immense. Amidst this torrent of data, an automated tool that offers a definitive-seeming shortcut is incredibly alluring.

When an AI flags a 4mm lung nodule, the radiologist is primed to agree. Finding that same nodule de novo might take minutes of careful scrolling and windowing. The AI offers it up instantly. This is a tremendous benefit when the AI is correct.

But what happens when the AI flags a confluence of pulmonary vessels and calls it a nodule? The study we’re discussing confirmed our deepest anxieties: incorrect AI results can, and do, influence radiologists to make a wrong decision. In the study’s control group, when an AI was wrong, the human radiologist was more likely to be wrong, too. The AI’s error became the radiologist’s error. This is the cognitive siren song: a beautiful, alluring call that can lead the unwary clinician onto the rocks of misdiagnosis. It’s a silent pull, a subtle suggestion that plants a seed of confirmation bias, prompting the radiologist to find evidence to support the AI’s claim rather than approaching the area with a neutral, diagnostic mindset.

To simply deploy AI tools into this environment without guardrails is to ignore decades of human factors research. It is an invitation for error. The study’s brilliance lies not in identifying this problem, but in testing practical, elegant solutions to solve it.

The Power of Permanence: To Delete or To Keep?

The first major insight from the study is as profound as it is subtle. Researchers tested two different psychological framings for the AI’s output.

The “Keep” Condition: One group of radiologists was told that the AI’s findings – whether they agreed with them or not – would be permanently saved as part of the patient’s imaging record.
The “Delete” Condition: The other group was told that the AI’s findings were merely suggestions. If the radiologist disagreed with the AI, its output would be completely deleted, leaving no trace in the patient’s file.

The results were striking. Radiologists in the “Delete” condition were significantly less likely to be swayed by an incorrect AI suggestion. They were more willing to overrule the machine and trust their own expertise.

Why would such a seemingly small change in workflow have such a large effect on diagnostic judgment? The answer lies in the psychology of accountability and professional friction.

When an AI’s finding is destined for the permanent record, it takes on an air of officialdom. For a radiologist to disagree with it creates a recorded conflict. The final report would essentially say, “I, the human expert, saw this case and made my determination, but a permanent part of this record is an algorithm that disagrees with me.” This creates a cognitive dissonance. It introduces a layer of professional friction and a potential medico-legal shadow. A future physician, or worse, a lawyer in a malpractice suit, could point to that permanent AI finding and ask, “The computer saw something, doctor. Why did you ignore it?”

Conversely, in the “Delete” condition, the AI is framed as what it truly should be: a consultative, ephemeral input. It is a junior resident whispering a suggestion in your ear. You can consider it, accept it, or dismiss it without ceremony. If you dismiss it, it vanishes. The final report is yours and yours alone. The signature at the bottom carries the full weight of your professional judgment, unencumbered by a dissenting digital ghost.

This finding has massive implications for AI vendors and hospital IT departments. The prevailing wisdom might be to save everything, to create a complete data trail. But this study suggests that, from a human factors perspective, the ephemerality of AI suggestions can be a powerful safety feature. The best systems may be those that allow a radiologist to easily and completely dismiss an incorrect finding, empowering them to remain the ultimate arbiter of truth for that patient’s diagnosis without creating a confusing and potentially hazardous permanent record of disagreement.

The Humble Bounding Box: The Unsung Hero of AI-Assisted Diagnosis

The second, and arguably more powerful, finding from this research centers on the presentation of the AI’s results. The study showed that when the AI provided a simple bounding box – a visual outline drawn around the suspicious region – it fundamentally changed the dynamic between the human and the machine for the better.

This may sound obvious. Of course, showing the radiologist where the finding is located would be helpful. But the true genius of the box emerged when the researchers analyzed its effect in both scenarios: when the AI was right, and when the AI was wrong.

Scenario 1: The AI is Correct

When the AI correctly identified a pathology (like a lung nodule) and drew a box around it, the benefit was clear and expected. Radiologist performance improved. The box acts as a powerful attentional cue. It slashes search time, reduces the risk of perceptual errors (so-called “satisfaction of search,” where finding one abnormality makes you less likely to find others), and provides instant visual confirmation. This is the classic value proposition of Computer-Aided Detection (CADe) and it works splendidly.

Scenario 2: The AI is Incorrect

This is where the paradigm-shifting insight lies. The study found that even when the AI was wrong, providing a bounding box for its incorrect finding still improved the radiologist’s overall performance.

Let that sink in for a moment. An AI making a mistake – for instance, drawing a box around a benign vessel crossing and calling it a nodule – made the human radiologist better. How is this possible? This counter-intuitive result is the key to understanding how to build a truly synergistic human-AI partnership. There are a few complementary hypotheses for why this occurs.

The Focused Scrutiny Hypothesis: Without a box, an incorrect AI finding might be a vague text alert, like “Suspicious nodule in the right upper lobe.” The radiologist then has to re-search a large anatomical area, a task prone to error. With a box, however, the cognitive task is transformed. The question is no longer “Is there a nodule somewhere in this lobe?” but rather, “Is the thing inside this specific box a nodule?” This is a much simpler, more constrained question. The radiologist can zoom in, change the window and level settings, and apply their expert knowledge to that precise location. They can more easily and confidently conclude, “No, that is not a nodule; it is clearly a vessel.” The box forces a micro second-look, turning a passive search into an active, focused process of confirmation or rejection. It allows the radiologist to kill the bad idea quickly and definitively.
The “Intelligent Second Look” Heuristic: The bounding box acts as a prompt for re-evaluation, much like a colleague pointing to a film and asking, “What do you make of this spot here?” Even if your colleague is mistaken, their query forces you to engage with that specific area of the image with a higher level of scrutiny than you might have during a routine sweep. The AI, even in its error, is essentially flagging an area of anatomical complexity or ambiguity that might warrant a closer look anyway. In rejecting the AI’s specific hypothesis (nodule), the radiologist might notice a different, more subtle finding nearby (e.g., a faint area of ground-glass opacity) that they might have otherwise missed.
Reducing Cognitive Load for Rejection: The act of rejecting a hypothesis is a cognitive task in itself. The bounding box makes this task easier. By localizing the “threat,” the AI allows the radiologist to neutralize it efficiently. This may prevent the lingering doubt or the need for time-consuming re-scans of the entire study that a vague, non-localized alert might trigger.

In essence, the bounding box turns the AI from a would-be oracle into a focused searchlight. Sometimes the searchlight illuminates the treasure. Other times, it illuminates a pile of rocks, but in doing so, it allows you to quickly identify them as rocks and move on, confident that you’ve thoroughly examined that spot. This is the definition of augmentation: the tool makes the human expert more efficient and more confident, both when it is right and when it is wrong.

A Call to Action

This study is more than an interesting academic paper; it is a direct call to action. The findings provide a clear, evidence-based mandate for how we must proceed with the integration of AI into clinical practice. This responsibility is shared among three key groups.

1. For Radiology Practices and Hospital Administrators: The implementation of AI is not just a software purchase; it is a clinical pathway redesign. You are not just buying an algorithm; you are buying a human-computer interface that will directly impact patient care.

Prioritize Human-Centered Design: When evaluating AI vendors, scrutinize the user interface (UI) and user experience (UX) with the same rigor you apply to the algorithm’s accuracy metrics. How are findings displayed? Is there a bounding box or segmentation mask? How easily can a radiologist dismiss, edit, or accept a finding?
Embrace the “Delete” Principle: Work with vendors to create workflows where AI suggestions are treated as non-permanent, consultative inputs. The radiologist’s final signed report should be the single source of truth. The AI’s “opinion” should not haunt the patient’s record unless it has been explicitly validated and incorporated by the human expert.
Pilot and Observe: Before a department-wide rollout, conduct pilot studies within your own environment. Observe how your radiologists interact with the tool. Gather feedback. The interaction in a controlled demo is very different from the reality of a high-volume Tuesday afternoon.

2. For Radiological Societies (RSNA, ACR, ESR, and others): As the professional bodies that guide our specialty, we have a duty to lead. The wild west of AI implementation must be tamed with thoughtful, evidence-based guidance.

Formulate Clinical Practice Guidelines: We need to move swiftly to develop and disseminate guidelines for the clinical use of AI. These should address best practices for interaction, how to handle AI-human disagreements, and standards for the user interface.
Standardize Reporting of AI Usage: A major debate is brewing: should the use of an AI tool be documented in the final radiology report? If so, how? Standardizing this language is essential for clinical clarity, legal protection, and patient communication.
Develop AI Literacy Curricula: We must define and promote “AI literacy” for all radiologists – not just an understanding of the statistics, but a deep, practical knowledge of these human-factors principles, including automation bias and the importance of interface design. This should become a core part of residency training and continuing medical education.

3. For Every Practicing Radiologist: Ultimately, the responsibility for the final diagnosis rests with us. We are the final line of defense for our patients.

Cultivate “Informed Skepticism”: We must learn to treat AI tools as we would a first-year resident: enthusiastic and often correct, but requiring constant supervision and verification. Never accept an AI finding without personally verifying it from the source data.
Understand Your Tool’s Weaknesses: Demand more than a sales pitch from vendors. Ask for data on their algorithm’s failure modes. Where does it struggle? Does it perform poorly on certain patient populations or with certain scanner protocols? Knowing your tool’s limitations is as important as knowing its strengths.
Be an Active Participant in Design: You are the end-user. Your feedback is invaluable. Advocate for better interfaces. If a tool’s workflow is clumsy or its presentation of findings is ambiguous, speak up. You are not just a consumer of this technology; you are a critical partner in its development.

Forging a True Partnership

The integration of artificial intelligence into radiology represents a monumental shift in the history of medicine. But its ultimate success will not be measured by the cleverness of our algorithms, but by the wisdom of our implementation.

The study reminds us that the human in the loop is not a bug to be engineered out, but the central feature of a safe and effective system. It proves that simple, thoughtful design choices – a bounding box to focus attention, a workflow that empowers the radiologist’s final judgment – can transform AI from a potential source of error into a robust tool for augmentation.

The future is not a radiologist replaced by a machine. The future is a radiologist, armed with decades of experience and nuanced judgment, working in a seamless, intelligent partnership with tools designed to augment their strengths and guard against their weaknesses. By focusing on the human factors, by building systems that are forgiving of the AI’s flaws and respectful of the human’s expertise, we can deliver on the promise of AI and elevate the standard of care for every patient we serve.

(1) European Radiology https://doi.org/10.1007/s00330-023-09747-1 CHEST Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multireader pilot study of lung cancer detection with chest radiography Michael H. Bernstein1,2,3 · Michael K. Atalay1,3 · Elizabeth H. Dibble1 · Aaron W. P. Maxwell1,3 · Adib R. Karam1 · Saurabh Agarwal1 · Robert C. Ward1 · Terrance T. Healey1 · Grayson L. Baird1,2,3 Received: 9 December 2022 / Revised: 16 April 2023 / Accepted: 18 April 2023

Mark Watts is an experienced imaging professional who founded an AI company called Zenlike.ai.

Keep the box? How Human-Centered Deployment Prevents Errors

Using AI in Healthcare Does Not Fix Healthcare

Frontier Operations: The Skills That Don’t Age as the AI Surface Area Expands

Infusion Pump Security: How a Hospital SOC Keeps Medication Moving & Patients Safe

GE HealthCare Expands NucMed through AI, Radiopharmaceutical & Innovations

Study: Portable MRI Reduces Time to Imaging in ED

RADIN Health and AZmed Elevate Radiology Workflow

Downtime Trace Recruiting Imaging & Biomed Service Providers for Northeast & Midwest Expansion

Fujifilm & mTuitive Team Up to Advance Precision Medicine

Follow us

LINKS

Keep the box? How Human-Centered Deployment Prevents Errors

By Mark Watts

Related Posts

Subscribe to Updates

Follow us

LINKS