Table of Contents
Fetching ...

Tackling domain generalization for out-of-distribution endoscopic imaging

Mansoor Ali Teevno, Gilberto Ochoa-Ruiz, Sharib Ali

TL;DR

This work exploits both style and content information in images by performing instance normalization and feature covariance mapping techniques to preserve robust and generalizable feature representations and introduces a restitution module within the feature-learning ResNet backbone that retains useful task-relevant features.

Abstract

While recent advances in deep learning (DL) for surgical scene segmentation have yielded promising results on single-center and single-imaging modality data, these methods usually do not generalize well to unseen distributions or modalities. Even though human experts can identify visual appearances, DL methods often fail to do so when data samples do not follow a similar distribution. Current literature addressing domain gaps in modality changes has focused primarily on natural scene data. However, these methods cannot be directly applied to endoscopic data, as visual cues in such data are more limited compared to natural scenes. In this work, we exploit both style and content information in images by performing instance normalization and feature covariance mapping techniques to preserve robust and generalizable feature representations. Additionally, to avoid the risk of removing salient feature representations associated with objects of interest, we introduce a restitution module within the feature-learning ResNet backbone that retains useful task-relevant features. Our proposed method shows a 13.7% improvement over the baseline DeepLabv3+ and nearly an 8% improvement over recent state-of-the-art (SOTA) methods for the target (different modality) set of the EndoUDA polyp dataset. Similarly, our method achieved a 19% improvement over the baseline and 6% over the best-performing SOTA method on the EndoUDA Barrett's esophagus (BE) dataset.

Tackling domain generalization for out-of-distribution endoscopic imaging

TL;DR

This work exploits both style and content information in images by performing instance normalization and feature covariance mapping techniques to preserve robust and generalizable feature representations and introduces a restitution module within the feature-learning ResNet backbone that retains useful task-relevant features.

Abstract

While recent advances in deep learning (DL) for surgical scene segmentation have yielded promising results on single-center and single-imaging modality data, these methods usually do not generalize well to unseen distributions or modalities. Even though human experts can identify visual appearances, DL methods often fail to do so when data samples do not follow a similar distribution. Current literature addressing domain gaps in modality changes has focused primarily on natural scene data. However, these methods cannot be directly applied to endoscopic data, as visual cues in such data are more limited compared to natural scenes. In this work, we exploit both style and content information in images by performing instance normalization and feature covariance mapping techniques to preserve robust and generalizable feature representations. Additionally, to avoid the risk of removing salient feature representations associated with objects of interest, we introduce a restitution module within the feature-learning ResNet backbone that retains useful task-relevant features. Our proposed method shows a 13.7% improvement over the baseline DeepLabv3+ and nearly an 8% improvement over recent state-of-the-art (SOTA) methods for the target (different modality) set of the EndoUDA polyp dataset. Similarly, our method achieved a 19% improvement over the baseline and 6% over the best-performing SOTA method on the EndoUDA Barrett's esophagus (BE) dataset.

Paper Structure

This paper contains 11 sections, 9 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Sample images from the EndoUDA dataset. On the left, these show the images acquired with white light imaging (WLI) and on the right, a narrow-band imaging frames (NBI) for polyps and Barrett's esophagus (BE) celik_endouda_2021.
  • Figure 2: Block diagram of the our proposed method for generalizable surgical scene segmentation. A. depicts the overall flow of the method trained on two datasets. The encoder takes two images, i.e., raw image transformed image, Initially, and feeds the intermediate features to the SRW block. B. depicts, SNR block jin2021style selectively retains the useful features for the generalization, while WT applied selectively suppresses the domain-specific features and and preserves domain-invariant features. Lastly, decoder block performs the up-sampling for segmentation output.
  • Figure 3: Qualitative results. Top two rows contain qualitative performance on target (NBI) modality Barrett's esophagus data and the bottom two rows consist of results on target (NBI) modality polyp data.