Table of Contents
Fetching ...

Data Augmentation for Surgical Scene Segmentation with Anatomy-Aware Diffusion Models

Danush Kumar Venkatesh, Dominik Rivoir, Micha Pfeiffer, Fiona Kolbinger, Stefanie Speidel

TL;DR

This framework improves anatomy awareness by training organ specific models with an inpainting objective guided by binary segmen-tation masks, achieving a 15% improvement in segmentation scores when combined with real images.

Abstract

In computer-assisted surgery, automatically recognizing anatomical organs is crucial for understanding the surgical scene and providing intraoperative assistance. While machine learning models can identify such structures, their deployment is hindered by the need for labeled, diverse surgical datasets with anatomical annotations. Labeling multiple classes (i.e., organs) in a surgical scene is time-intensive, requiring medical experts. Although synthetically generated images can enhance segmentation performance, maintaining both organ structure and texture during generation is challenging. We introduce a multi-stage approach using diffusion models to generate multi-class surgical datasets with annotations. Our framework improves anatomy awareness by training organ specific models with an inpainting objective guided by binary segmentation masks. The organs are generated with an inference pipeline using pre-trained ControlNet to maintain the organ structure. The synthetic multi-class datasets are constructed through an image composition step, ensuring structural and textural consistency. This versatile approach allows the generation of multi-class datasets from real binary datasets and simulated surgical masks. We thoroughly evaluate the generated datasets on image quality and downstream segmentation, achieving a $15\%$ improvement in segmentation scores when combined with real images. The code is available at https://gitlab.com/nct_tso_public/muli-class-image-synthesis

Data Augmentation for Surgical Scene Segmentation with Anatomy-Aware Diffusion Models

TL;DR

This framework improves anatomy awareness by training organ specific models with an inpainting objective guided by binary segmen-tation masks, achieving a 15% improvement in segmentation scores when combined with real images.

Abstract

In computer-assisted surgery, automatically recognizing anatomical organs is crucial for understanding the surgical scene and providing intraoperative assistance. While machine learning models can identify such structures, their deployment is hindered by the need for labeled, diverse surgical datasets with anatomical annotations. Labeling multiple classes (i.e., organs) in a surgical scene is time-intensive, requiring medical experts. Although synthetically generated images can enhance segmentation performance, maintaining both organ structure and texture during generation is challenging. We introduce a multi-stage approach using diffusion models to generate multi-class surgical datasets with annotations. Our framework improves anatomy awareness by training organ specific models with an inpainting objective guided by binary segmentation masks. The organs are generated with an inference pipeline using pre-trained ControlNet to maintain the organ structure. The synthetic multi-class datasets are constructed through an image composition step, ensuring structural and textural consistency. This versatile approach allows the generation of multi-class datasets from real binary datasets and simulated surgical masks. We thoroughly evaluate the generated datasets on image quality and downstream segmentation, achieving a improvement in segmentation scores when combined with real images. The code is available at https://gitlab.com/nct_tso_public/muli-class-image-synthesis

Paper Structure

This paper contains 20 sections, 5 equations, 14 figures, 13 tables.

Figures (14)

  • Figure 1: The generated multi-class surgical images (Generated images column) for three different surgical datasets (denoted by name on the left side) with their corresponding semantic masks using our diffusion approach. Our approach can generate realistic and diverse organ textures using the segmentation masks as masking and conditioning signals.
  • Figure 2: Overview of the diffusion approach to generate a multi-class dataset. Stage-$1$ involves training the SD inpainting model using the real images and masks for each organ separately. In stage-$2$, pre-trained ControlNet is plugged into the SSI model (SSI-CN) to precisely generate each anatomical structure using extracted edges from the segmentation mask. The image composition in stage-$3$ includes cutting out each organ from the generated image and combining them together to form the multi-class surgical dataset. Stage-$4$ (optional) includes an image refinement process using SDEdit meng2021sdedit to rectify inconsistencies during the composition operation and generate the multi-class images. We skip stage-$1$ for the simulated masks and start directly with the inference stages to generate the synthetic datasets.
  • Figure 3: The generated images before and after Stage-$4$. White boxes show the inconsistent regions. The junction between two organs is smoothened, while the overall texture of the image is maintained.
  • Figure 4: The generated images using simulated masks (SS). By using SS masks, we can generate surgical images other than the train datasets as the organ shapes differs with a similar organ texture to real datasets.
  • Figure 5: Image quality comparison on the DSAD dataset. The GAN methods (columns 2-4) fail to generate high quality images. The diffusion methods (columns 5-8) generate organs with realistic looking textures, however the spatial alignment to the semantic label is broken. Our method is able to maintain the shape and texture of different organs.
  • ...and 9 more figures