Table of Contents
Fetching ...

DAug: Diffusion-based Channel Augmentation for Radiology Image Retrieval and Classification

Ying Jin, Zhuoran Zhou, Haoquan Fang, Jenq-Neng Hwang

TL;DR

The paper addresses the difficulty of robust radiology image understanding under limited data by introducing Diffusion-based Feature Augmentation (DAug), which generates disease-specific heatmaps via a classifier-guided diffusion model and appends them as extra input channels. It further proposes Image-Text-Class Hybrid Contrastive Learning to jointly leverage text reports and class labels, enabling a single model to perform both retrieval and classification. Experimental results on MIMIC-CXR demonstrate state-of-the-art performance in both retrieval and classification tasks, with ablations validating the effectiveness of heatmap augmentation and the unified contrastive loss. The approach is portable to standard pretrained models and holds practical potential for clinical deployment, albeit with notable computational overhead to generate heatmaps.

Abstract

Medical image understanding requires meticulous examination of fine visual details, with particular regions requiring additional attention. While radiologists build such expertise over years of experience, it is challenging for AI models to learn where to look with limited amounts of training data. This limitation results in unsatisfying robustness in medical image understanding. To address this issue, we propose Diffusion-based Feature Augmentation (DAug), a portable method that improves a perception model's performance with a generative model's output. Specifically, we extend a radiology image to multiple channels, with the additional channels being the heatmaps of regions where diseases tend to develop. A diffusion-based image-to-image translation model was used to generate such heatmaps conditioned on selected disease classes. Our method is motivated by the fact that generative models learn the distribution of normal and abnormal images, and such knowledge is complementary to image understanding tasks. In addition, we propose the Image-Text-Class Hybrid Contrastive learning to utilize both text and class labels. With two novel approaches combined, our method surpasses baseline models without changing the model architecture, and achieves state-of-the-art performance on both medical image retrieval and classification tasks.

DAug: Diffusion-based Channel Augmentation for Radiology Image Retrieval and Classification

TL;DR

The paper addresses the difficulty of robust radiology image understanding under limited data by introducing Diffusion-based Feature Augmentation (DAug), which generates disease-specific heatmaps via a classifier-guided diffusion model and appends them as extra input channels. It further proposes Image-Text-Class Hybrid Contrastive Learning to jointly leverage text reports and class labels, enabling a single model to perform both retrieval and classification. Experimental results on MIMIC-CXR demonstrate state-of-the-art performance in both retrieval and classification tasks, with ablations validating the effectiveness of heatmap augmentation and the unified contrastive loss. The approach is portable to standard pretrained models and holds practical potential for clinical deployment, albeit with notable computational overhead to generate heatmaps.

Abstract

Medical image understanding requires meticulous examination of fine visual details, with particular regions requiring additional attention. While radiologists build such expertise over years of experience, it is challenging for AI models to learn where to look with limited amounts of training data. This limitation results in unsatisfying robustness in medical image understanding. To address this issue, we propose Diffusion-based Feature Augmentation (DAug), a portable method that improves a perception model's performance with a generative model's output. Specifically, we extend a radiology image to multiple channels, with the additional channels being the heatmaps of regions where diseases tend to develop. A diffusion-based image-to-image translation model was used to generate such heatmaps conditioned on selected disease classes. Our method is motivated by the fact that generative models learn the distribution of normal and abnormal images, and such knowledge is complementary to image understanding tasks. In addition, we propose the Image-Text-Class Hybrid Contrastive learning to utilize both text and class labels. With two novel approaches combined, our method surpasses baseline models without changing the model architecture, and achieves state-of-the-art performance on both medical image retrieval and classification tasks.

Paper Structure

This paper contains 20 sections, 4 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Diffusion-based Feature Augmentation (DAug) pipeline. The original image is translated into a diseased or healthy version with a classifier-guided diffusion model. The upper row shows an example where an image with a healthy heart is turned into a cardiomegaly (enlarged heart) version. The difference between the input and output images produces a heatmap highlighting the potential area of the corresponding disease. In the case of cardiomegaly, the heatmap correctly highlights the boundary of the heart. The heatmaps are added to the original monochrome radiology image as additional image channels, resulting in an augmented input feature that can improve the performance of downstream tasks. The DAug features support multiple disease categories (two examples in the second row), and our softmax-based approach generates more accurate heatmaps than the existing baseline. Green and orange bounding boxes indicate correctly and wrongly highlighted regions, respectively.
  • Figure 2: Example output of abnormality heatmaps generated by DAug. We use two chest X-rays as examples, each one covers four types of abnormalities (cardiomegaly, consolidation, etc. in each column). The plus ($+$) and minus ($-$) signs indicate the direction of the classifier gradient, meaning amplifying the disease and reducing the disease, respectively. "+ No Findings" reduces the probability of all potential diseases. For each input, the first row shows the output heatmap guided by gradients of softmax probabilities, and the second row shows the results guided by gradients of the sigmoid probabilities. The green bounding boxes shows that our method correctly highlights the region of the disease, which can help the mode to establish better image-text correspondence. Also, using softmax gradient is better than sigmoid gradient as guidance, as softmax successfully removes false positives (see orange circles). Orange circle in the second row highlights false positive areas of consolidation, and the orange circle in the last row highlights a wrong activation of lung lesion when it is supposed to detect pleural effusion. The corresponding softmax version (green box) makes correct detections.
  • Figure 3: Model architecture and Image-Text-Class Hybrid Contrastive Loss. The inputs are pairs of radiology reports and DAug features (3-channel images including both medical image and abnormality heatmap channels). The image and text encoders are pretrained by CLIP. $R$ and $C$ are text embeddings for the reports and class prompts, respectively. The hybrid contrastive loss includes both the image-text CLIP loss and image-class contrastive loss. Blue cells are positive pairs, in which the image-class matchings are derived from ground-truth class labels.