VisionCAD: An Integration-Free Radiology Copilot Framework
Jiaming Li, Junlei Wu, Sheng Wang, Honglin Xiong, Jiangdong Cai, Zihao Zhao, Yitao Zhu, Yuan Yin, Dinggang Shen, Qian Wang
TL;DR
VisionCAD tackles the integration barrier for AI radiology by capturing on-screen images with a camera and processing them through a six-stage pipeline to enable diagnostic analysis and automated reporting. The framework combines a Vision Capturer, Screen Detector, Quality Enhancer, Modality Router, Diagnostic Engine, and Report Assistant, leveraging Ark+ for chest X-rays, Restormer for restoration, BiomedCLIP for modality routing, and multimodal LLMs for reporting. Across PneumoniaMNIST, OAI, Nodules, and MIMIC-CXR, VisionCAD achieves diagnostic performance close to conventional CAD operating on original images, with F1-score degradation within a few percentage points and automated report quality closely matching baselines. This integration-free approach offers practical deployment advantages, enabling AI-assisted radiology in diverse clinical settings without modifying existing IT infrastructure, while highlighting limitations related to device quality, privacy, and the need for broader real-world validation.
Abstract
Widespread clinical deployment of computer-aided diagnosis (CAD) systems is hindered by the challenge of integrating with existing hospital IT infrastructure. Here, we introduce VisionCAD, a vision-based radiological assistance framework that circumvents this barrier by capturing medical images directly from displays using a camera system. The framework operates through an automated pipeline that detects, restores, and analyzes on-screen medical images, transforming camera-captured visual data into diagnostic-quality images suitable for automated analysis and report generation. We validated VisionCAD across diverse medical imaging datasets, demonstrating that our modular architecture can flexibly utilize state-of-the-art diagnostic models for specific tasks. The system achieves diagnostic performance comparable to conventional CAD systems operating on original digital images, with an F1-score degradation typically less than 2\% across classification tasks, while natural language generation metrics for automated reports remain within 1\% of those derived from original images. By requiring only a camera device and standard computing resources, VisionCAD offers an accessible approach for AI-assisted diagnosis, enabling the deployment of diagnostic capabilities in diverse clinical settings without modifications to existing infrastructure.
