Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology
Nur Yildirim, Hannah Richardson, Maria T. Wetscherek, Junaid Bajwa, Joseph Jacob, Mark A. Pinnock, Stephen Harris, Daniel Coelho de Castro, Shruthi Bannur, Stephanie L. Hyland, Pratik Ghosh, Mercy Ranjit, Kenza Bouzid, Anton Schwaighofer, Fernando Pérez-García, Harshita Sharma, Ozan Oktay, Matthew Lungren, Javier Alvarez-Valle, Aditya Nori, Anja Thieme
TL;DR
This paper investigates how vision-language models can augment radiology workflows through a human-centered, three-phase design process. By engaging 13 radiologists and clinicians, it identifies four clinically relevant use concepts (Draft Report Generation, Augmented Report Review, Visual Search and Querying, and Patient Imaging History Highlights) and develops prototype sketches and user feedback to surface design requirements. The study highlights that VLMs offer value in information extraction, evidence retrieval, and workflow support, but emphasizes constraints around AI performance, latency, risk, and seamless integration into fast-paced clinical practice. The findings inform practical guidelines for deploying VLMs in radiology and broader healthcare contexts, stressing task-specific tooling, EHR alignment, and human-in-the-loop governance.
Abstract
Recent advances in AI combine large language models (LLMs) with vision encoders that bring forward unprecedented technical capabilities to leverage for a wide range of healthcare applications. Focusing on the domain of radiology, vision-language models (VLMs) achieve good performance results for tasks such as generating radiology findings based on a patient's medical image, or answering visual questions (e.g., 'Where are the nodules in this chest X-ray?'). However, the clinical utility of potential applications of these capabilities is currently underexplored. We engaged in an iterative, multidisciplinary design process to envision clinically relevant VLM interactions, and co-designed four VLM use concepts: Draft Report Generation, Augmented Report Review, Visual Search and Querying, and Patient Imaging History Highlights. We studied these concepts with 13 radiologists and clinicians who assessed the VLM concepts as valuable, yet articulated many design considerations. Reflecting on our findings, we discuss implications for integrating VLM capabilities in radiology, and for healthcare AI more generally.
