Table of Contents
Fetching ...

Domain Generalization for Endoscopic Image Segmentation by Disentangling Style-Content Information and SuperPixel Consistency

Mansoor Ali Teevno, Rafael Martinez-Garcia-Pena, Gilberto Ochoa-Ruiz, Sharib Ali

TL;DR

This work proposes an approach for style-content disentanglement using instance normalization and instance selective whitening (ISW) for an improved domain generalization when combined with SUPRA and demonstrates a notable enhancement in performance.

Abstract

Frequent monitoring is necessary to stratify individuals based on their likelihood of developing gastrointestinal (GI) cancer precursors. In clinical practice, white-light imaging (WLI) and complementary modalities such as narrow-band imaging (NBI) and fluorescence imaging are used to assess risk areas. However, conventional deep learning (DL) models show degraded performance due to the domain gap when a model is trained on one modality and tested on a different one. In our earlier approach, we used a superpixel-based method referred to as "SUPRA" to effectively learn domain-invariant information using color and space distances to generate groups of pixels. One of the main limitations of this earlier work is that the aggregation does not exploit structural information, making it suboptimal for segmentation tasks, especially for polyps and heterogeneous color distributions. Therefore, in this work, we propose an approach for style-content disentanglement using instance normalization and instance selective whitening (ISW) for improved domain generalization when combined with SUPRA. We evaluate our approach on two datasets: EndoUDA Barrett's Esophagus and EndoUDA polyps, and compare its performance with three state-of-the-art (SOTA) methods. Our findings demonstrate a notable enhancement in performance compared to both baseline and SOTA methods across the target domain data. Specifically, our approach exhibited improvements of 14%, 10%, 8%, and 18% over the baseline and three SOTA methods on the polyp dataset. Additionally, it surpassed the second-best method (EndoUDA) on the Barrett's Esophagus dataset by nearly 2%.

Domain Generalization for Endoscopic Image Segmentation by Disentangling Style-Content Information and SuperPixel Consistency

TL;DR

This work proposes an approach for style-content disentanglement using instance normalization and instance selective whitening (ISW) for an improved domain generalization when combined with SUPRA and demonstrates a notable enhancement in performance.

Abstract

Frequent monitoring is necessary to stratify individuals based on their likelihood of developing gastrointestinal (GI) cancer precursors. In clinical practice, white-light imaging (WLI) and complementary modalities such as narrow-band imaging (NBI) and fluorescence imaging are used to assess risk areas. However, conventional deep learning (DL) models show degraded performance due to the domain gap when a model is trained on one modality and tested on a different one. In our earlier approach, we used a superpixel-based method referred to as "SUPRA" to effectively learn domain-invariant information using color and space distances to generate groups of pixels. One of the main limitations of this earlier work is that the aggregation does not exploit structural information, making it suboptimal for segmentation tasks, especially for polyps and heterogeneous color distributions. Therefore, in this work, we propose an approach for style-content disentanglement using instance normalization and instance selective whitening (ISW) for improved domain generalization when combined with SUPRA. We evaluate our approach on two datasets: EndoUDA Barrett's Esophagus and EndoUDA polyps, and compare its performance with three state-of-the-art (SOTA) methods. Our findings demonstrate a notable enhancement in performance compared to both baseline and SOTA methods across the target domain data. Specifically, our approach exhibited improvements of 14%, 10%, 8%, and 18% over the baseline and three SOTA methods on the polyp dataset. Additionally, it surpassed the second-best method (EndoUDA) on the Barrett's Esophagus dataset by nearly 2%.
Paper Structure (12 sections, 7 equations, 5 figures, 2 tables)

This paper contains 12 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Sample images from EndoUDA dataset. On the left, these show the images acquired with white light imaging (WLI) and on the right, a narrow-band imaging frames (NBI) for polyps and Barret's Esophagus (BE)celik_endouda_2021.
  • Figure 2: Block diagram of our proposed model. Input image is provided to both segmentation network and Simple Linear Iterative Clustering (SLIC) achanta_slic_2012 to compute superpixels. We take intermediate feature maps from the ResNet50 backbone to apply ISW transformation to disentangle style-content information. The output prediction mask is then combined with the superpixel grid, where two different loss objectives are computed and combined together denoted as $\mathcal{L}_{SLIC}$: 1) the superpixel guided loss, which assesses how closely the mask follows the superpixel boundaries (a red circle indicates a segmentation that does not follow the object edges, while a blue checkmark indicates a border that is carefully followed). 2) binary cross entroppy loss which determines the overall performance of the network (green checkmark indicates good accuracy while the red cross mark shows poor performance.)
  • Figure 3: Instance selective whitening (ISW) block. Original input image and its photometric transformed image are passed through the backbone architecture (ResNet50) where feature covariance matrices are computed at three different layers of ResNet50. Variance is computed from both covariance to determine the the mask matrix which is then used to selectively whiten the feature covariance to disentangle style-content information.
  • Figure 4: Effect of combined loss: Evaluating the impact of using combined loss of the proposed model and superpixel guided loss. It can be observed in the top part that SUPRA martinez2023supra fails to delineate object boundaries which could lead towards poor segmentation output while our proposed architecture does well at correctly segmenting the object of interest. The superpixel grids obtained in the proposed model are at $k$=500.
  • Figure 5: Qualitative comparison: We include frames from tested models on BE and polyp target domain datasets, comparing between DeepLabv3+ chen2018encoder, IBN-Net pan2018two, RobustNet choi2021robustnet, EndoUDA celik_endouda_2021 and our proposed method. Results indicate that our proposed model performed very close to the ground truth as compared to other methods.