Table of Contents
Fetching ...

SurgicaL-CD: Generating Surgical Images via Unpaired Image Translation with Latent Consistency Diffusion Models

Danush Kumar Venkatesh, Dominik Rivoir, Micha Pfeiffer, Stefanie Speidel

TL;DR

This work introduces \emph{SurgicaL-CD}, a consistency-distilled diffusion method to generate realistic surgical images with only a few sampling steps without paired data, and demonstrates that this method outperforms GANs and diffusion-based approaches.

Abstract

Computer-assisted surgery (CAS) systems are designed to assist surgeons during procedures, thereby reducing complications and enhancing patient care. Training machine learning models for these systems requires a large corpus of annotated datasets, which is challenging to obtain in the surgical domain due to patient privacy concerns and the significant labeling effort required from doctors. Previous methods have explored unpaired image translation using generative models to create realistic surgical images from simulations. However, these approaches have struggled to produce high-quality, diverse surgical images. In this work, we introduce \emph{SurgicaL-CD}, a consistency-distilled diffusion method to generate realistic surgical images with only a few sampling steps without paired data. We evaluate our approach on three datasets, assessing the generated images in terms of quality and utility as downstream training datasets. Our results demonstrate that our method outperforms GANs and diffusion-based approaches. Our code is available at https://gitlab.com/nct_tso_public/gan2diffusion.

SurgicaL-CD: Generating Surgical Images via Unpaired Image Translation with Latent Consistency Diffusion Models

TL;DR

This work introduces \emph{SurgicaL-CD}, a consistency-distilled diffusion method to generate realistic surgical images with only a few sampling steps without paired data, and demonstrates that this method outperforms GANs and diffusion-based approaches.

Abstract

Computer-assisted surgery (CAS) systems are designed to assist surgeons during procedures, thereby reducing complications and enhancing patient care. Training machine learning models for these systems requires a large corpus of annotated datasets, which is challenging to obtain in the surgical domain due to patient privacy concerns and the significant labeling effort required from doctors. Previous methods have explored unpaired image translation using generative models to create realistic surgical images from simulations. However, these approaches have struggled to produce high-quality, diverse surgical images. In this work, we introduce \emph{SurgicaL-CD}, a consistency-distilled diffusion method to generate realistic surgical images with only a few sampling steps without paired data. We evaluate our approach on three datasets, assessing the generated images in terms of quality and utility as downstream training datasets. Our results demonstrate that our method outperforms GANs and diffusion-based approaches. Our code is available at https://gitlab.com/nct_tso_public/gan2diffusion.
Paper Structure (17 sections, 10 equations, 5 figures, 4 tables)

This paper contains 17 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The realistic surgical images ($3^{rd}$-$5^{th}$ column) generated using our (few) step diffusion approach in an unpaired fashion with their corresponding semantic labels. The diffusion models are trained using the real images ($1^{st}$ column), and the simulated images ($2^{nd}$ column) are used as inputs during inference. There exists no one-on-one spatial correspondence between the real and simulated domains. Or approach is able to add fine details like vessels similar to real images.
  • Figure 2: Our unpaired few-step diffusion method. As the $1^{st}$ stage, SD rombach2022high model is fine-tuned using the text prompts on each real surgical dataset. After training, this model is capable of generating surgical images. In the $2^{nd}$ stage, consistency distillation of the fine-tuned model occurs using real surgical images and text prompts. We call this model SurgicaL-CD that generates surgical images in a few steps given a text prompt. Finally, the simulated images are given as input to the SurgicaL-CD model, which uses SDEdit meng2021sdedit for image translation. In this manner, the simulated images are translated into realistic surgical images. To preserve the structure of different organs, pre-trained ControlNet zhang2023adding is optionally used in the inference pipeline.
  • Figure 3: Comparison to baselines on the Cholec80 dataset. The translated images from our method are compared to state-of-the-art GAN and diffusion methods. Our method can add finer details like vessels and generate the real textures of the organs directly during translation. The pre-texturing patterns are visible after translation in LC-SD kaleta2024minimal, while GAN approaches lack image quality.
  • Figure 4: Comparison to baselines on the CholecT50 dataset. The $3^{rd}$ column shows the simulated images after OT mapping. LC-SD kaleta2024minimal fails to adapt the color and texture of real images. Texture transfer does not occur using the DPM++-solver lu2022dpm with SDEdit meng2021sdedit for $3$ or $10$ steps. $20$-Step SDEdit shows good image quality, while the absence of structure control leads to a hallucinated gall bladder ($2nd$ row)(white box). Our method ($1$ and $2$ step) can maintain the shape and transfer fine texture details from real images.
  • Figure 5: Ablation results on our $1$-step method. Without the ControlNet zhang2023adding, the edges of the organs and surgical tools are smoothened (indicated with white boxes). Semantic style leakage occurs in depth CNs, whereas edge control can maintain the structure and style of the generated images similar to the real images.