Table of Contents
Fetching ...

Ultrasound Image Generation using Latent Diffusion Models

Benoit Freiche, Anthony El-Khoury, Ali Nasiri-Sarvi, Mahdi S. Hosseini, Damien Garcia, Adrian Basarab, Mathieu Boily, Hassan Rivaz

TL;DR

This work tackles the scarcity of publicly available ultrasound data by generating synthetic US images using a fine-tuned latent diffusion model. It leverages Stable Diffusion trained on the BUSI breast US dataset and adds ControlNet to condition outputs on segmentation masks, providing targeted, controllable generation. Qualitative results show realistic anatomy and pathologies, while quantitative experiments demonstrate that synthetic images can improve downstream classification performance, indicating the model captures essential US statistics. The study highlights the potential of synthetic US data for training and evaluation, with future directions including memory-efficient fine-tuning, multi-dataset training, physics-based simulations, and broader community release.

Abstract

Diffusion models for image generation have been a subject of increasing interest due to their ability to generate diverse, high-quality images. Image generation has immense potential in medical imaging because open-source medical images are difficult to obtain compared to natural images, especially for rare conditions. The generated images can be used later to train classification and segmentation models. In this paper, we propose simulating realistic ultrasound (US) images by successive fine-tuning of large diffusion models on different publicly available databases. To do so, we fine-tuned Stable Diffusion, a state-of-the-art latent diffusion model, on BUSI (Breast US Images) an ultrasound breast image dataset. We successfully generated high-quality US images of the breast using simple prompts that specify the organ and pathology, which appeared realistic to three experienced US scientists and a US radiologist. Additionally, we provided user control by conditioning the model with segmentations through ControlNet. We will release the source code at http://code.sonography.ai/ to allow fast US image generation to the scientific community.

Ultrasound Image Generation using Latent Diffusion Models

TL;DR

This work tackles the scarcity of publicly available ultrasound data by generating synthetic US images using a fine-tuned latent diffusion model. It leverages Stable Diffusion trained on the BUSI breast US dataset and adds ControlNet to condition outputs on segmentation masks, providing targeted, controllable generation. Qualitative results show realistic anatomy and pathologies, while quantitative experiments demonstrate that synthetic images can improve downstream classification performance, indicating the model captures essential US statistics. The study highlights the potential of synthetic US data for training and evaluation, with future directions including memory-efficient fine-tuning, multi-dataset training, physics-based simulations, and broader community release.

Abstract

Diffusion models for image generation have been a subject of increasing interest due to their ability to generate diverse, high-quality images. Image generation has immense potential in medical imaging because open-source medical images are difficult to obtain compared to natural images, especially for rare conditions. The generated images can be used later to train classification and segmentation models. In this paper, we propose simulating realistic ultrasound (US) images by successive fine-tuning of large diffusion models on different publicly available databases. To do so, we fine-tuned Stable Diffusion, a state-of-the-art latent diffusion model, on BUSI (Breast US Images) an ultrasound breast image dataset. We successfully generated high-quality US images of the breast using simple prompts that specify the organ and pathology, which appeared realistic to three experienced US scientists and a US radiologist. Additionally, we provided user control by conditioning the model with segmentations through ControlNet. We will release the source code at http://code.sonography.ai/ to allow fast US image generation to the scientific community.

Paper Structure

This paper contains 11 sections, 4 figures.

Figures (4)

  • Figure 1: Current results of two models for image generation: (first row) Stable Diffusion 1.5 (second row) ChatGPT4o. Different prompts have been used: (a) "Ultrasound image of breast", (b) "Ultrasound image of breast with a benign lesion", (c) "Ultrasound image of breast with a malignant tumor"
  • Figure 2: Demonstration of the main principles of Stable Diffusion and ControlNet. The ControlNet is a copy of the autoencoder of Stable Diffusion, on which zero-convolution layers are added and progressively learned.
  • Figure 3: Some results of our fine-tuning on BUSI images. Each row represents a category, (a) normal, (b) benign, (c) malignant. The results are realistic to US experts and an experienced US radiologist. The categories (Normal, Benign, and Malignant) are also respected.
  • Figure 4: Samples conditioned by segmentation masks, generated with ControlNet on BUSI. Each row represents a category: normal (a), benign (b), malignant (c). The columns are from left to right: (1) the input segmentation masks (2) the original image corresponding to the segmentation (ground truth) (3-6) four generated samples. For the benign and malignant cases, the lesion corresponds to the input segmentation mask.