DAug: Diffusion-based Channel Augmentation for Radiology Image Retrieval and Classification
Ying Jin, Zhuoran Zhou, Haoquan Fang, Jenq-Neng Hwang
TL;DR
The paper addresses the difficulty of robust radiology image understanding under limited data by introducing Diffusion-based Feature Augmentation (DAug), which generates disease-specific heatmaps via a classifier-guided diffusion model and appends them as extra input channels. It further proposes Image-Text-Class Hybrid Contrastive Learning to jointly leverage text reports and class labels, enabling a single model to perform both retrieval and classification. Experimental results on MIMIC-CXR demonstrate state-of-the-art performance in both retrieval and classification tasks, with ablations validating the effectiveness of heatmap augmentation and the unified contrastive loss. The approach is portable to standard pretrained models and holds practical potential for clinical deployment, albeit with notable computational overhead to generate heatmaps.
Abstract
Medical image understanding requires meticulous examination of fine visual details, with particular regions requiring additional attention. While radiologists build such expertise over years of experience, it is challenging for AI models to learn where to look with limited amounts of training data. This limitation results in unsatisfying robustness in medical image understanding. To address this issue, we propose Diffusion-based Feature Augmentation (DAug), a portable method that improves a perception model's performance with a generative model's output. Specifically, we extend a radiology image to multiple channels, with the additional channels being the heatmaps of regions where diseases tend to develop. A diffusion-based image-to-image translation model was used to generate such heatmaps conditioned on selected disease classes. Our method is motivated by the fact that generative models learn the distribution of normal and abnormal images, and such knowledge is complementary to image understanding tasks. In addition, we propose the Image-Text-Class Hybrid Contrastive learning to utilize both text and class labels. With two novel approaches combined, our method surpasses baseline models without changing the model architecture, and achieves state-of-the-art performance on both medical image retrieval and classification tasks.
