Table of Contents
Fetching ...

Joint semi-supervised and contrastive learning enables domain generalization and multi-domain segmentation

Alvaro Gomariz, Yusuke Kikuchi, Yun Yvonna Li, Thomas Albrecht, Andreas Maunz, Daniela Ferrara, Huanxiang Lu, Orcun Goksel

TL;DR

SegCLR tackles domain shift in multi-domain retinal OCT fluid segmentation by fusing supervised Dice-based segmentation with contrastive learning in a unified framework on a UNet backbone. It introduces flexible pair generation and spatially aware contrastive projection to preserve segmentation-relevant context, and optimizes a joint loss that leverages labeled and unlabeled data across source and target domains. Through extensive experiments on three OCT datasets, SegCLR demonstrates strong unsupervised domain adaptation and robust domain generalization, even with little to no unlabeled target data, and benefits from multi-domain training. The approach offers a practical, data-efficient path toward generalizable medical image segmentation across diverse devices and disease conditions, with stable performance across random initializations and clear guidance for selecting a reference configuration.

Abstract

Despite their effectiveness, current deep learning models face challenges with images coming from different domains with varying appearance and content. We introduce SegCLR, a versatile framework designed to segment images across different domains, employing supervised and contrastive learning simultaneously to effectively learn from both labeled and unlabeled data. We demonstrate the superior performance of SegCLR through a comprehensive evaluation involving three diverse clinical datasets of 3D retinal Optical Coherence Tomography (OCT) images, for the slice-wise segmentation of fluids with various network configurations and verification across 10 different network initializations. In an unsupervised domain adaptation context, SegCLR achieves results on par with a supervised upper-bound model trained on the intended target domain. Notably, we discover that the segmentation performance of SegCLR framework is marginally impacted by the abundance of unlabeled data from the target domain, thereby we also propose an effective domain generalization extension of SegCLR, known also as zero-shot domain adaptation, which eliminates the need for any target domain information. This shows that our proposed addition of contrastive loss in standard supervised training for segmentation leads to superior models, inherently more generalizable to both in- and out-of-domain test data. We additionally propose a pragmatic solution for SegCLR deployment in realistic scenarios with multiple domains containing labeled data. Accordingly, our framework pushes the boundaries of deep-learning based segmentation in multi-domain applications, regardless of data availability - labeled, unlabeled, or nonexistent.

Joint semi-supervised and contrastive learning enables domain generalization and multi-domain segmentation

TL;DR

SegCLR tackles domain shift in multi-domain retinal OCT fluid segmentation by fusing supervised Dice-based segmentation with contrastive learning in a unified framework on a UNet backbone. It introduces flexible pair generation and spatially aware contrastive projection to preserve segmentation-relevant context, and optimizes a joint loss that leverages labeled and unlabeled data across source and target domains. Through extensive experiments on three OCT datasets, SegCLR demonstrates strong unsupervised domain adaptation and robust domain generalization, even with little to no unlabeled target data, and benefits from multi-domain training. The approach offers a practical, data-efficient path toward generalizable medical image segmentation across diverse devices and disease conditions, with stable performance across random initializations and clear guidance for selecting a reference configuration.

Abstract

Despite their effectiveness, current deep learning models face challenges with images coming from different domains with varying appearance and content. We introduce SegCLR, a versatile framework designed to segment images across different domains, employing supervised and contrastive learning simultaneously to effectively learn from both labeled and unlabeled data. We demonstrate the superior performance of SegCLR through a comprehensive evaluation involving three diverse clinical datasets of 3D retinal Optical Coherence Tomography (OCT) images, for the slice-wise segmentation of fluids with various network configurations and verification across 10 different network initializations. In an unsupervised domain adaptation context, SegCLR achieves results on par with a supervised upper-bound model trained on the intended target domain. Notably, we discover that the segmentation performance of SegCLR framework is marginally impacted by the abundance of unlabeled data from the target domain, thereby we also propose an effective domain generalization extension of SegCLR, known also as zero-shot domain adaptation, which eliminates the need for any target domain information. This shows that our proposed addition of contrastive loss in standard supervised training for segmentation leads to superior models, inherently more generalizable to both in- and out-of-domain test data. We additionally propose a pragmatic solution for SegCLR deployment in realistic scenarios with multiple domains containing labeled data. Accordingly, our framework pushes the boundaries of deep-learning based segmentation in multi-domain applications, regardless of data availability - labeled, unlabeled, or nonexistent.
Paper Structure (18 sections, 5 equations, 9 figures, 11 tables)

This paper contains 18 sections, 5 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Illustration of Unsupervised Domain Adaptation (UDA) and Domain Generalization frameworks studied herein for SegCLR. The colored ellipses indicate the losses to use in training, with the variable superscripts representing the domain being source (s) or target (t).
  • Figure 2: SegCLR architecture employed for joint supervised and contrastive learning. Layers are represented as arrows and their outputs as rectangles. The width and height of these outputs is annotated at the upper left of the rectangles, and the number of features at the bottom. $F(\cdot)$ is the segmentation backbone, $E(\cdot)$ the encoder, and $C(\cdot)$ the contrastive projection. The architectures of $\rho^\mathrm{agg}$ and $\rho^\mathrm{MLP}$ are described in Section \ref{['sec:contrastive_projection']}.
  • Figure 3: Illustration of semi-supervised contrastive learning framework for unsupervised domain adaptation. The SegCLR block corresponds to the architecture in Figure \ref{['fig:network_scheme']}. The repel losses are not used by SimSiam. Supervised losses are only used when labeled images exist. While this framework is flexible to accommodate any number of labeled images, at least one is required to drive the decoder arm of the underlying UNet.
  • Figure 4: Proposed pair generation approaches for contrastive learning with 3D images using SegCLR.
  • Figure 5: Relative segmentation metrics for cross-device domain adaptation, i.e., using $D^s=D_1$ and ${D^t=D_2}$. The dashed line depicts the average Baseline result used as reference for the relative metrics. SVDNA is included in brackets for $D^s$, as in practice Baseline model would be used instead.
  • ...and 4 more figures