Table of Contents
Fetching ...

RobustSurg: Tackling domain generalisation for out-of-distribution surgical scene segmentation

Mansoor Ali, Maksim Richards, Gilberto Ochoa-Ruiz, Sharib Ali

TL;DR

RobustSurg tackles domain generalisation in surgical scene segmentation under cross-centre and cross-modality shifts. It introduces a Domain-invariant Feature Encoder with Style Normalization and Restitution and Instance Selective Whitening to preserve discriminative content while removing style variations, along with a new HeiCholeSeg multicentre dataset. The method achieves state-of-the-art mean IoU on IID CholecSeg8K and significant improvements on OOD HeiCholeSeg, EndoUDA, and cataract datasets, demonstrating robust generalisation with a modest computational cost increase. The work provides a valuable benchmark and practical approach for reliable surgical scene understanding in diverse clinical settings.

Abstract

While recent advances in deep learning for surgical scene segmentation have demonstrated promising results on single-centre and single-imaging modality data, these methods usually do not generalise to unseen distribution (i.e., from other centres) and unseen modalities. Current literature for tackling generalisation on out-of-distribution data and domain gaps due to modality changes has been widely researched but mostly for natural scene data. However, these methods cannot be directly applied to the surgical scenes due to limited visual cues and often extremely diverse scenarios compared to the natural scene data. Inspired by these works in natural scenes to push generalisability on OOD data, we hypothesise that exploiting the style and content information in the surgical scenes could minimise the appearances, making it less variable to sudden changes such as blood or imaging artefacts. This can be achieved by performing instance normalisation and feature covariance mapping techniques for robust and generalisable feature representations. Further, to eliminate the risk of removing salient feature representation associated with the objects of interest, we introduce a restitution module within the feature learning ResNet backbone that can enable the retention of useful task-relevant features. To tackle the lack of multiclass and multicentre data for surgical scene segmentation, we also provide a newly curated dataset that can be vital for addressing generalisability in this domain. Our proposed RobustSurg obtained nearly 23% improvement on the baseline DeepLabv3+ and from 10-32% improvement on the SOTA in terms of mean IoU score on an unseen centre HeiCholSeg dataset when trained on CholecSeg8K. Similarly, RobustSurg also obtained nearly 22% improvement over the baseline and nearly 11% improvement on a recent SOTA method for the target set of the EndoUDA polyp dataset.

RobustSurg: Tackling domain generalisation for out-of-distribution surgical scene segmentation

TL;DR

RobustSurg tackles domain generalisation in surgical scene segmentation under cross-centre and cross-modality shifts. It introduces a Domain-invariant Feature Encoder with Style Normalization and Restitution and Instance Selective Whitening to preserve discriminative content while removing style variations, along with a new HeiCholeSeg multicentre dataset. The method achieves state-of-the-art mean IoU on IID CholecSeg8K and significant improvements on OOD HeiCholeSeg, EndoUDA, and cataract datasets, demonstrating robust generalisation with a modest computational cost increase. The work provides a valuable benchmark and practical approach for reliable surgical scene understanding in diverse clinical settings.

Abstract

While recent advances in deep learning for surgical scene segmentation have demonstrated promising results on single-centre and single-imaging modality data, these methods usually do not generalise to unseen distribution (i.e., from other centres) and unseen modalities. Current literature for tackling generalisation on out-of-distribution data and domain gaps due to modality changes has been widely researched but mostly for natural scene data. However, these methods cannot be directly applied to the surgical scenes due to limited visual cues and often extremely diverse scenarios compared to the natural scene data. Inspired by these works in natural scenes to push generalisability on OOD data, we hypothesise that exploiting the style and content information in the surgical scenes could minimise the appearances, making it less variable to sudden changes such as blood or imaging artefacts. This can be achieved by performing instance normalisation and feature covariance mapping techniques for robust and generalisable feature representations. Further, to eliminate the risk of removing salient feature representation associated with the objects of interest, we introduce a restitution module within the feature learning ResNet backbone that can enable the retention of useful task-relevant features. To tackle the lack of multiclass and multicentre data for surgical scene segmentation, we also provide a newly curated dataset that can be vital for addressing generalisability in this domain. Our proposed RobustSurg obtained nearly 23% improvement on the baseline DeepLabv3+ and from 10-32% improvement on the SOTA in terms of mean IoU score on an unseen centre HeiCholSeg dataset when trained on CholecSeg8K. Similarly, RobustSurg also obtained nearly 22% improvement over the baseline and nearly 11% improvement on a recent SOTA method for the target set of the EndoUDA polyp dataset.

Paper Structure

This paper contains 21 sections, 9 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: Overview of domain shift problem in the MIS. a. Top section shows the intra-source domain variations which we aim to exploit in learning a generalisable model, while bottom section shows shift problem between source and target domains. b. Comparison of intensity histograms between training source domain and different centre target domains.
  • Figure 2: Comparative illustration of existing DG approaches and our proposed approach. Current Instance normalization (IN) and Instance whitening (IW) techniques (a-c) performs feature standardization and minimise global distribution variance but also loose some useful discriminative information (shown with faded symbols). Our proposed Domain-invariant feature encoder (DIFE) block aims to restore the lost features during IN stage. Note that this figure shows features from two classes for simplicity, but the idea applies to all classes.
  • Figure 3: Block diagram of the RobustSurg method for generalisable surgical scene segmentation. The encoder takes two images, i.e., raw image and transformed image. Conventional segmentors such as DeepLabv3+ chen2018encoder underperform on unseen images as shown in the prediction. In our approach, we introduce domain-invariant feature encoder (DIFE) module containing two sub-blocks. SNR block jin2021style normalises and recovers the lost features while ISW block selectively suppresses the style information.
  • Figure 4: Example from the HeiCholeSeg multicentre dataset. Each image is shown alongwith its corresponding semantic segmentation mask. Images depict varying photometric properties, and aspect ratios across the surgical centres.
  • Figure 5: Features from intermediate backbone layers are fed to the SNR block. SNR applies instance normalization (IN) on input features, followed by channel attention to restore lost information due to IN. Dual causality loss encourages the disentanglement between useful and contaminated features.
  • ...and 7 more figures