Table of Contents
Fetching ...

VisionCAD: An Integration-Free Radiology Copilot Framework

Jiaming Li, Junlei Wu, Sheng Wang, Honglin Xiong, Jiangdong Cai, Zihao Zhao, Yitao Zhu, Yuan Yin, Dinggang Shen, Qian Wang

TL;DR

VisionCAD tackles the integration barrier for AI radiology by capturing on-screen images with a camera and processing them through a six-stage pipeline to enable diagnostic analysis and automated reporting. The framework combines a Vision Capturer, Screen Detector, Quality Enhancer, Modality Router, Diagnostic Engine, and Report Assistant, leveraging Ark+ for chest X-rays, Restormer for restoration, BiomedCLIP for modality routing, and multimodal LLMs for reporting. Across PneumoniaMNIST, OAI, Nodules, and MIMIC-CXR, VisionCAD achieves diagnostic performance close to conventional CAD operating on original images, with F1-score degradation within a few percentage points and automated report quality closely matching baselines. This integration-free approach offers practical deployment advantages, enabling AI-assisted radiology in diverse clinical settings without modifying existing IT infrastructure, while highlighting limitations related to device quality, privacy, and the need for broader real-world validation.

Abstract

Widespread clinical deployment of computer-aided diagnosis (CAD) systems is hindered by the challenge of integrating with existing hospital IT infrastructure. Here, we introduce VisionCAD, a vision-based radiological assistance framework that circumvents this barrier by capturing medical images directly from displays using a camera system. The framework operates through an automated pipeline that detects, restores, and analyzes on-screen medical images, transforming camera-captured visual data into diagnostic-quality images suitable for automated analysis and report generation. We validated VisionCAD across diverse medical imaging datasets, demonstrating that our modular architecture can flexibly utilize state-of-the-art diagnostic models for specific tasks. The system achieves diagnostic performance comparable to conventional CAD systems operating on original digital images, with an F1-score degradation typically less than 2\% across classification tasks, while natural language generation metrics for automated reports remain within 1\% of those derived from original images. By requiring only a camera device and standard computing resources, VisionCAD offers an accessible approach for AI-assisted diagnosis, enabling the deployment of diagnostic capabilities in diverse clinical settings without modifications to existing infrastructure.

VisionCAD: An Integration-Free Radiology Copilot Framework

TL;DR

VisionCAD tackles the integration barrier for AI radiology by capturing on-screen images with a camera and processing them through a six-stage pipeline to enable diagnostic analysis and automated reporting. The framework combines a Vision Capturer, Screen Detector, Quality Enhancer, Modality Router, Diagnostic Engine, and Report Assistant, leveraging Ark+ for chest X-rays, Restormer for restoration, BiomedCLIP for modality routing, and multimodal LLMs for reporting. Across PneumoniaMNIST, OAI, Nodules, and MIMIC-CXR, VisionCAD achieves diagnostic performance close to conventional CAD operating on original images, with F1-score degradation within a few percentage points and automated report quality closely matching baselines. This integration-free approach offers practical deployment advantages, enabling AI-assisted radiology in diverse clinical settings without modifying existing IT infrastructure, while highlighting limitations related to device quality, privacy, and the need for broader real-world validation.

Abstract

Widespread clinical deployment of computer-aided diagnosis (CAD) systems is hindered by the challenge of integrating with existing hospital IT infrastructure. Here, we introduce VisionCAD, a vision-based radiological assistance framework that circumvents this barrier by capturing medical images directly from displays using a camera system. The framework operates through an automated pipeline that detects, restores, and analyzes on-screen medical images, transforming camera-captured visual data into diagnostic-quality images suitable for automated analysis and report generation. We validated VisionCAD across diverse medical imaging datasets, demonstrating that our modular architecture can flexibly utilize state-of-the-art diagnostic models for specific tasks. The system achieves diagnostic performance comparable to conventional CAD systems operating on original digital images, with an F1-score degradation typically less than 2\% across classification tasks, while natural language generation metrics for automated reports remain within 1\% of those derived from original images. By requiring only a camera device and standard computing resources, VisionCAD offers an accessible approach for AI-assisted diagnosis, enabling the deployment of diagnostic capabilities in diverse clinical settings without modifications to existing infrastructure.

Paper Structure

This paper contains 19 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Integration paradigms for radiological diagnosis. (a) Conventional CAD systems require complex integration with hospital IT infrastructure, while VisionCAD bypasses these challenges through direct image capture. (b) The VisionCAD workflow proceeds through six components: (1) Vision Capturer captures images from displays; (2) Screen Detector isolates medical image; (3) Quality Enhancer restores image quality; (4) Modality Router identifies image type; (5) Diagnostic Engine performs analysis; (6) Report Assistant generates clinical reports.
  • Figure 2: Setup of the Vision Capturer and its field of view characteristics. The Kinect's wide-angle lens (Horizontal 90° Vertical 59°) effectively captures entire monitor displays from the standard radiologist viewing distance of 50–60 cm.
  • Figure 3: The dataset synthesis process for fine-tuning our medical image localization model. First, we collect 50 common radiology information system screenshots and manually annotate regions dedicated to displaying medical images. Next, we randomly insert diverse radiological images into these templates, synthesizing 10K training samples with bounding box annotations.
  • Figure 4: Quality Enhancer performance evaluation across different restoration methods. (a) We evaluated seven restoration methods (UNet, MSEC, GRL, UFormer, SwinIR, UHDFormer, Restormer) on four medical imaging datasets (PneumoniaMNIST, Nodule, OAI, MIMIC-CXR). Image quality is assessed using PSNR and SSIM metrics (SSIM values multiplied by 100 for display consistency). (b) Visual comparisons illustrate how different methods correct capture-related degradations. Within each row, images progress from Captured (raw detector output) through the seven restoration methods to Original (ground truth). Zoomed-in regions support detailed inspection of restoration effectiveness.
  • Figure 5: Performance evaluation using modality-specific SOTA models. Ark$^{+}$ represents the current SOTA for chest X-ray diagnosis, while ViT fine-tuning is employed for tasks without established foundation models. All models were trained on Original images and evaluated on both Original and VisionCAD test sets.
  • ...and 2 more figures