Table of Contents
Fetching ...

Hybrid Diffusion Model for Breast Ultrasound Image Augmentation

Farhan Fuad Abir, Sanjeda Sara Jennifer, Niloofar Yousefi, Laura J. Brattain

Abstract

We propose a hybrid diffusion-based augmentation framework to overcome the critical challenge of ultrasound data augmentation in breast ultrasound (BUS) datasets. Unlike conventional diffusion-based augmentations, our approach improves visual fidelity and preserves ultrasound texture by combining text-to-image generation with image-to-image (img2img) refinement, as well as fine-tuning with low-rank adaptation (LoRA) and textual inversion (TI). Our method generated realistic, class-consistent images on an open-source Kaggle breast ultrasound image dataset (BUSI). Compared to the Stable Diffusion v1.5 baseline, incorporating TI and img2img refinement reduced the Frechet Inception Distance (FID) from 45.97 to 33.29, demonstrating a substantial gain in fidelity while maintaining comparable downstream classification performance. Overall, the proposed framework effectively mitigates the low-fidelity limitations of synthetic ultrasound images and enhances the quality of augmentation for robust diagnostic modeling.

Hybrid Diffusion Model for Breast Ultrasound Image Augmentation

Abstract

We propose a hybrid diffusion-based augmentation framework to overcome the critical challenge of ultrasound data augmentation in breast ultrasound (BUS) datasets. Unlike conventional diffusion-based augmentations, our approach improves visual fidelity and preserves ultrasound texture by combining text-to-image generation with image-to-image (img2img) refinement, as well as fine-tuning with low-rank adaptation (LoRA) and textual inversion (TI). Our method generated realistic, class-consistent images on an open-source Kaggle breast ultrasound image dataset (BUSI). Compared to the Stable Diffusion v1.5 baseline, incorporating TI and img2img refinement reduced the Frechet Inception Distance (FID) from 45.97 to 33.29, demonstrating a substantial gain in fidelity while maintaining comparable downstream classification performance. Overall, the proposed framework effectively mitigates the low-fidelity limitations of synthetic ultrasound images and enhances the quality of augmentation for robust diagnostic modeling.

Paper Structure

This paper contains 12 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of the proposed hybrid diffusion-based image generation framework for breast ultrasound. The method consists of three main stages. (a) The preprocessing stage converts the labels into descriptive prompts. (b) LoRA finetuning and Token Generation adapts the Stable Diffusion v1.5 using LoRA for image–prompt alignment and Textual Inversion for learning domain-specific <ultrasound> token. (c) In the final workflow, the prompts and learned token <ultrasound> are passed through the finetuned text-to-image (text2img) model to generate synthetic images, which are further refined using an image-to-image (img2img) stage with LoRA weights, yielding the final synthetic ultrasound images.
  • Figure 2: Comparison of real ultrasound images and synthetic variants generated by 4 different approaches based on SD1.5. Rows correspond to breast lesion categories: benign (top), malignant (middle), and normal (bottom). Columns show (from left to right): real images, baseline SD1.5 generations, SD1.5 with img2img refinement, SD1.5 with TI, and SD1.5 combined with TI and img2img. The img2img refinement increases the fidelity by improving ultrasound textures.