Large-scale Long-tailed Disease Diagnosis on Radiology Images

Qiaoyu Zheng; Weike Zhao; Chaoyi Wu; Xiaoman Zhang; Lisong Dai; Hengyu Guan; Yuehua Li; Ya Zhang; Yanfeng Wang; Weidi Xie

Large-scale Long-tailed Disease Diagnosis on Radiology Images

Qiaoyu Zheng, Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Lisong Dai, Hengyu Guan, Yuehua Li, Ya Zhang, Yanfeng Wang, Weidi Xie

TL;DR

RadDiag introduces aTransformer-based, multimodal radiology foundation model capable of ingesting arbitrary numbers of 2D and 3D scans across modalities for case-level, multi-label disease diagnosis. The RP3D-DiagDS dataset, drawn from Radiopaedia, provides over 40k cases spanning 9 modalities and 7 anatomies with 5,568 disorders mapped to 930 ICD-10-CM codes, enabling long-tailed learning. A knowledge-enhanced training pipeline supervises a vision encoder with a medical-text backbone to improve discrimination among rare diseases, and a fusion module aggregates multi-scan information at case level. Empirical results show strong internal performance ($\approx$ $95\%$ AUC), substantial zero-shot and finetuning transfer to external benchmarks, and robust generalization across modalities and anatomies, underscoring the value of publicly available medical data for building generalist AI in healthcare.

Abstract

Developing a generalist radiology diagnosis system can greatly enhance clinical diagnostics. In this paper, we introduce RadDiag, a foundational model supporting 2D and 3D inputs across various modalities and anatomies, using a transformer-based fusion module for comprehensive disease diagnosis. Due to patient privacy concerns and the lack of large-scale radiology diagnosis datasets, we utilize high-quality, clinician-reviewed radiological images available online with diagnosis labels. Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5,568 disorders (930 unique ICD-10-CM codes). Experimentally, our RadDiag achieves 95.14% AUC on internal evaluation with the knowledge-enhancement strategy. Additionally, RadDiag can be zero-shot applied or fine-tuned to external diagnosis datasets sourced from various hospitals, demonstrating state-of-the-art results. In conclusion, we show that publicly shared medical data on the Internet is a tremendous and valuable resource that can potentially support building a generalist AI for healthcare.

Large-scale Long-tailed Disease Diagnosis on Radiology Images

TL;DR

AUC), substantial zero-shot and finetuning transfer to external benchmarks, and robust generalization across modalities and anatomies, underscoring the value of publicly available medical data for building generalist AI in healthcare.

Abstract

Paper Structure (27 sections, 8 equations, 8 figures, 9 tables)

This paper contains 27 sections, 8 equations, 8 figures, 9 tables.

INTRODUCTION
RESULTS
DISCUSSION
METHODS
Dataset Construction
Model Design
Architecture
Knowledge-enhanced Training
Evaluation Details
The Internal Test-Set
The External Test-Sets
Baselines For Comparison
Metrics
Implementation
CONCLUSION
...and 12 more sections

Figures (8)

Figure 1: Overview of RP3D-DiagDS. There are 39,026 cases (192,675 scans) across 7 human anatomy regions and 9 diverse modalities covering 930 ICD-10-CM codes.
Figure 1: ROC curves on nine classes in "head" category of disorders. FM, KE is short for Fusion Module and Knowledge Enhancement.
Figure 2: The data distribution analysis on RP3D-DiagDS.a The distribution of imaging modalities of abnormal (left) and normal (right) cases in RP3D-DiagDS. Each label is annotated with the class name, number of cases, and the corresponding proportion. b The distribution of imaging anatomies of abnormal (left) and normal (right) cases in RP3D-DiagDS.c Case distribution on image numbers. In the bar plot, We show the distribution for the number of images in one case. In RP3D-DiagDS, each case may include multiple images from patient history scans, different modalities, and different angles or conditions. d Case distribution on classes. We demonstrate the long-tailed distributions for disorder and ICD-10-CM classes. We also categorize these classes into three categories: "head class", "body class" and "tail class" based on the number of cases. Notably, to better show the main part of the case distributions, we clip the axes, indicated by the dotted axes lines.
Figure 3: ROC curves on Disorders and ICD-10-CM, including head/medium/tail parts respectively. The shadow in the figure shows the 95% CI (Confidence interval) and FM, KE is short for Fusion Module and Knowledge Enhancement. More ROC curves for each class are shown in Supplementary Section F
Figure 4: The prediction probability distribution for different classes and saliency map visualization.a Based on the anatomies of each case, we split them into positive cases, intra-negative cases which are located in the same anatomy as the positive ones, and inter-negative cases which are located in other anatomies. The classification threshold score in the figure denotes the final comparison bar to transform the soft probabilities into binary true/false diagnosis results. The first three probability distribution figures depict the distributions of three relatively successful classes, where the model can clearly distinguish the inter-negative cases and the intra-negative cases are more confusing. We then show two ordinary classes. As shown by the distributions, most errors are caused by the intra-negative cases, and similarly, the inter-negative cases are easily dismissed as well. At last, we show a failure case where the model can hardly distinguish the positive and negative cases regardless of whether they are intra-negative or inter-negative. b Saliency map of the key frames. Red indicates the areas that the model focuses on when inferring the corresponding disease category. This indicates that RadDiag is capable of accurately identifying the locations of lesions or abnormal regions.
...and 3 more figures

Large-scale Long-tailed Disease Diagnosis on Radiology Images

TL;DR

Abstract

Large-scale Long-tailed Disease Diagnosis on Radiology Images

Authors

TL;DR

Abstract

Table of Contents

Figures (8)