Table of Contents
Fetching ...

DiffBoost: Enhancing Medical Image Segmentation via Text-Guided Diffusion Model

Zheyuan Zhang, Lanhong Yao, Bin Wang, Debesh Jha, Gorkem Durak, Elif Keles, Alpay Medetalibeyoglu, Ulas Bagci

TL;DR

DiffBoost tackles data scarcity in medical image segmentation by leveraging a text-guided diffusion model pretrained on RadImageNet and fine-tuned for task-specific data. It incorporates text prompts and edge guidance to generate controlled synthetic images, which are merged with real data at the patch level during segmentation training. The method achieves consistent Dice improvements across ultrasound, CT, and MRI tasks and is validated through extensive ablations on augmentation ratio, hyperparameters, patch size, and backbone architectures. This work demonstrates the practicality of text- and edge-conditioned diffusion as a data-augmentation paradigm for robust medical image segmentation with potential for broader clinical impact.

Abstract

Large-scale, big-variant, high-quality data are crucial for developing robust and successful deep-learning models for medical applications since they potentially enable better generalization performance and avoid overfitting. However, the scarcity of high-quality labeled data always presents significant challenges. This paper proposes a novel approach to address this challenge by developing controllable diffusion models for medical image synthesis, called DiffBoost. We leverage recent diffusion probabilistic models to generate realistic and diverse synthetic medical image data that preserve the essential characteristics of the original medical images by incorporating edge information of objects to guide the synthesis process. In our approach, we ensure that the synthesized samples adhere to medically relevant constraints and preserve the underlying structure of imaging data. Due to the random sampling process by the diffusion model, we can generate an arbitrary number of synthetic images with diverse appearances. To validate the effectiveness of our proposed method, we conduct an extensive set of medical image segmentation experiments on multiple datasets, including Ultrasound breast (+13.87%), CT spleen (+0.38%), and MRI prostate (+7.78%), achieving significant improvements over the baseline segmentation methods. The promising results demonstrate the effectiveness of our \textcolor{black}{DiffBoost} for medical image segmentation tasks and show the feasibility of introducing a first-ever text-guided diffusion model for general medical image segmentation tasks. With carefully designed ablation experiments, we investigate the influence of various data augmentations, hyper-parameter settings, patch size for generating random merging mask settings, and combined influence with different network architectures. Source code are available at https://github.com/NUBagciLab/DiffBoost.

DiffBoost: Enhancing Medical Image Segmentation via Text-Guided Diffusion Model

TL;DR

DiffBoost tackles data scarcity in medical image segmentation by leveraging a text-guided diffusion model pretrained on RadImageNet and fine-tuned for task-specific data. It incorporates text prompts and edge guidance to generate controlled synthetic images, which are merged with real data at the patch level during segmentation training. The method achieves consistent Dice improvements across ultrasound, CT, and MRI tasks and is validated through extensive ablations on augmentation ratio, hyperparameters, patch size, and backbone architectures. This work demonstrates the practicality of text- and edge-conditioned diffusion as a data-augmentation paradigm for robust medical image segmentation with potential for broader clinical impact.

Abstract

Large-scale, big-variant, high-quality data are crucial for developing robust and successful deep-learning models for medical applications since they potentially enable better generalization performance and avoid overfitting. However, the scarcity of high-quality labeled data always presents significant challenges. This paper proposes a novel approach to address this challenge by developing controllable diffusion models for medical image synthesis, called DiffBoost. We leverage recent diffusion probabilistic models to generate realistic and diverse synthetic medical image data that preserve the essential characteristics of the original medical images by incorporating edge information of objects to guide the synthesis process. In our approach, we ensure that the synthesized samples adhere to medically relevant constraints and preserve the underlying structure of imaging data. Due to the random sampling process by the diffusion model, we can generate an arbitrary number of synthetic images with diverse appearances. To validate the effectiveness of our proposed method, we conduct an extensive set of medical image segmentation experiments on multiple datasets, including Ultrasound breast (+13.87%), CT spleen (+0.38%), and MRI prostate (+7.78%), achieving significant improvements over the baseline segmentation methods. The promising results demonstrate the effectiveness of our \textcolor{black}{DiffBoost} for medical image segmentation tasks and show the feasibility of introducing a first-ever text-guided diffusion model for general medical image segmentation tasks. With carefully designed ablation experiments, we investigate the influence of various data augmentations, hyper-parameter settings, patch size for generating random merging mask settings, and combined influence with different network architectures. Source code are available at https://github.com/NUBagciLab/DiffBoost.
Paper Structure (16 sections, 6 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 6 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Our proposed approach involves three stages: (a) training a diffusion model on a comprehensive radiology imaging dataset (RadImageNet), (b) fine-tuning the pre-trained model on a task-specific dataset, allowing for adaptation to the unique characteristics of each target task, and (c) utilizing the fine-tuned model for downstream task training, integrating the synthetic samples (generated during data augmentation) to enhance generalization and performance in the target task (segmentation).
  • Figure 2: In the provided illustrations, the initial row presents example original images sourced from RadImageNet. Middle row indicate edge maps of the original images to be used in diffusion process to enhance the visual quality. The last two rows includes sample images generated by the Pix2Pix GAN model and fine-tuned ControlNet across diverse modalities. Both the original and synthesized images maintain congruent anatomical structures, albeit there may be disparities in intensity.
  • Figure 3: Example augmented samples are illustrated in the last four columns while the first and second columns show original MRI, CT, and US images and their edge maps, respectively. Augmented samples show notable variances in intensity distribution (diversity) while they retain the structural integrity.
  • Figure 4: The hyper-parameter $\alpha$ determines the combination balance between original and augmented samples at the patch level.
  • Figure 5: Visual comparison of segmentation performance over some other augmentation methods. DiffBoost enables the model to generate a more consistent shape of the anatomical structure and outperforms other data augmentation methods.
  • ...and 2 more figures