Table of Contents
Fetching ...

CSG: A Context-Semantic Guided Diffusion Approach in De Novo Musculoskeletal Ultrasound Image Generation

Elay Dahan, Hedda Cohen Indelman, Angeles M. Perez-Agosto, Carmit Shiran, Gopal Avinash, Doron Shaked, Nati Daniel

TL;DR

This work introduces Context-Semantic Guidance (CSG), a dual-conditioning diffusion framework for de novo musculoskeletal ultrasound image generation that jointly controls anatomy via semantic masks and texture via context guidance. By combining a fine-tuned StyleGAN mask generator with context-aware texture selection and a paired latent diffusion translator, CSG produces high-fidelity images including pathological findings. Three-fold validation shows improved segmentation performance, higher fidelity to real images (lower FID and related metrics), and realistic appearance in Turing tests, relative to prior methods. An extension enables text-guided geometry editing and texture augmentation to broaden the variability space, potentially enhancing robustness of ultrasound AI systems.

Abstract

The use of synthetic images in medical imaging Artificial Intelligence (AI) solutions has been shown to be beneficial in addressing the limited availability of diverse, unbiased, and representative data. Despite the extensive use of synthetic image generation methods, controlling the semantics variability and context details remains challenging, limiting their effectiveness in producing diverse and representative medical image datasets. In this work, we introduce a scalable semantic and context-conditioned generative model, coined CSG (Context-Semantic Guidance). This dual conditioning approach allows for comprehensive control over both structure and appearance, advancing the synthesis of realistic and diverse ultrasound images. We demonstrate the ability of CSG to generate findings (pathological anomalies) in musculoskeletal (MSK) ultrasound images. Moreover, we test the quality of the synthetic images using a three-fold validation protocol. The results show that the synthetic images generated by CSG improve the performance of semantic segmentation models, exhibit enhanced similarity to real images compared to the baseline methods, and are undistinguishable from real images according to a Turing test. Furthermore, we demonstrate an extension of the CSG that allows enhancing the variability space of images by synthetically generating augmentations of anatomical geometries and textures.

CSG: A Context-Semantic Guided Diffusion Approach in De Novo Musculoskeletal Ultrasound Image Generation

TL;DR

This work introduces Context-Semantic Guidance (CSG), a dual-conditioning diffusion framework for de novo musculoskeletal ultrasound image generation that jointly controls anatomy via semantic masks and texture via context guidance. By combining a fine-tuned StyleGAN mask generator with context-aware texture selection and a paired latent diffusion translator, CSG produces high-fidelity images including pathological findings. Three-fold validation shows improved segmentation performance, higher fidelity to real images (lower FID and related metrics), and realistic appearance in Turing tests, relative to prior methods. An extension enables text-guided geometry editing and texture augmentation to broaden the variability space, potentially enhancing robustness of ultrasound AI systems.

Abstract

The use of synthetic images in medical imaging Artificial Intelligence (AI) solutions has been shown to be beneficial in addressing the limited availability of diverse, unbiased, and representative data. Despite the extensive use of synthetic image generation methods, controlling the semantics variability and context details remains challenging, limiting their effectiveness in producing diverse and representative medical image datasets. In this work, we introduce a scalable semantic and context-conditioned generative model, coined CSG (Context-Semantic Guidance). This dual conditioning approach allows for comprehensive control over both structure and appearance, advancing the synthesis of realistic and diverse ultrasound images. We demonstrate the ability of CSG to generate findings (pathological anomalies) in musculoskeletal (MSK) ultrasound images. Moreover, we test the quality of the synthetic images using a three-fold validation protocol. The results show that the synthetic images generated by CSG improve the performance of semantic segmentation models, exhibit enhanced similarity to real images compared to the baseline methods, and are undistinguishable from real images according to a Turing test. Furthermore, we demonstrate an extension of the CSG that allows enhancing the variability space of images by synthetically generating augmentations of anatomical geometries and textures.

Paper Structure

This paper contains 19 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: (a) Depiction of the differences between Image Translation GANs and Vanilla GANs, where the former creates synthetic images based on semantic labels. However, scalability is constrained when semantic labels are limited. In contrast, Vanilla GANs lack semantic information but offer the capability to generate an unlimited array of synthetic images. Addressing these challenges, our solution (CSG) is proposed. (b) CSG employs a triple-phase generative system. The initial step involves generating semantic masks of MSK labels using a fine-tuned StyleGAN architecture from a noise prior $z$. Subsequently, contextually similar images are selected using a neural algorithm of artistic style. Finally, the generated masks and contexts undergo processing through a paired image translation diffusion model to yield the synthetic ultrasound image. This approach harmonizes the advantages of semantic and context guidance for an unlimited and unbiased image generation.
  • Figure 2: Examples of a query and contextual similar image pairs. On the left: a query image. On the right: The most visually similar image in the dataset based on texture style selected by our context selection method.
  • Figure 3: Example synthetic ultrasound images generated by our CSG method. A semantic mask (on the left) and context image (top row) guide the synthetic image generation of our method (bottom row). Label mapping: Blue steel - muscle, Green – tendon, Light green- bone, Black - background.
  • Figure 4: Turing test results of the MSK synthetic images and a comparison of generated image examples.
  • Figure 5: Visualization of our extending the variability space. Our method allows for controlling the generation of the synthetic images by constraining its geometrical semantic properties and guiding its textural context properties. Top row: Text-guided mask editing results. Bottom row: Image-guided local texture editing results.