Table of Contents
Fetching ...

Mine yOur owN Anatomy: Revisiting Medical Image Segmentation with Extremely Limited Labels

Chenyu You, Weicheng Dai, Fenglin Liu, Yifei Min, Nicha C. Dvornek, Xiaoxiao Li, David A. Clifton, Lawrence Staib, James S. Duncan

TL;DR

A novel semi-supervised medical image segmentation framework termed Mine yOur owN Anatomy (MONA), and a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features in an unsupervised manner are constructed.

Abstract

Recent studies on contrastive learning have achieved remarkable performance solely by leveraging few labels in the context of medical image segmentation. Existing methods mainly focus on instance discrimination and invariant mapping. However, they face three common pitfalls: (1) tailness: medical image data usually follows an implicit long-tail class distribution. Blindly leveraging all pixels in training hence can lead to the data imbalance issues, and cause deteriorated performance; (2) consistency: it remains unclear whether a segmentation model has learned meaningful and yet consistent anatomical features due to the intra-class variations between different anatomical features; and (3) diversity: the intra-slice correlations within the entire dataset have received significantly less attention. This motivates us to seek a principled approach for strategically making use of the dataset itself to discover similar yet distinct samples from different anatomical views. In this paper, we introduce a novel semi-supervised 2D medical image segmentation framework termed Mine yOur owN Anatomy (MONA), and make three contributions. First, prior work argues that every pixel equally matters to the model training; we observe empirically that this alone is unlikely to define meaningful anatomical features, mainly due to lacking the supervision signal. We show two simple solutions towards learning invariances - through the use of stronger data augmentations and nearest neighbors. Second, we construct a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features in an unsupervised manner. Lastly, we both empirically and theoretically, demonstrate the efficacy of our MONA on three benchmark datasets with different labeled settings, achieving new state-of-the-art under different labeled semi-supervised settings.

Mine yOur owN Anatomy: Revisiting Medical Image Segmentation with Extremely Limited Labels

TL;DR

A novel semi-supervised medical image segmentation framework termed Mine yOur owN Anatomy (MONA), and a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features in an unsupervised manner are constructed.

Abstract

Recent studies on contrastive learning have achieved remarkable performance solely by leveraging few labels in the context of medical image segmentation. Existing methods mainly focus on instance discrimination and invariant mapping. However, they face three common pitfalls: (1) tailness: medical image data usually follows an implicit long-tail class distribution. Blindly leveraging all pixels in training hence can lead to the data imbalance issues, and cause deteriorated performance; (2) consistency: it remains unclear whether a segmentation model has learned meaningful and yet consistent anatomical features due to the intra-class variations between different anatomical features; and (3) diversity: the intra-slice correlations within the entire dataset have received significantly less attention. This motivates us to seek a principled approach for strategically making use of the dataset itself to discover similar yet distinct samples from different anatomical views. In this paper, we introduce a novel semi-supervised 2D medical image segmentation framework termed Mine yOur owN Anatomy (MONA), and make three contributions. First, prior work argues that every pixel equally matters to the model training; we observe empirically that this alone is unlikely to define meaningful anatomical features, mainly due to lacking the supervision signal. We show two simple solutions towards learning invariances - through the use of stronger data augmentations and nearest neighbors. Second, we construct a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features in an unsupervised manner. Lastly, we both empirically and theoretically, demonstrate the efficacy of our MONA on three benchmark datasets with different labeled settings, achieving new state-of-the-art under different labeled semi-supervised settings.
Paper Structure (19 sections, 4 theorems, 19 equations, 11 figures, 6 tables)

This paper contains 19 sections, 4 theorems, 19 equations, 11 figures, 6 tables.

Key Result

Theorem A.2

Let $0 < \delta < 1$. With probability at least $1-\delta$ over the distribution of the sample set $\mathcal{S}_n$, for all $g \in \mathcal{G}_{\ell,\mathcal{F}}$ such that $g(\mathbf{x},\mathbf{y}) = \ell(f(\mathbf{x}),\mathbf{y})$, it holds that:

Figures (11)

  • Figure 1: Examples of three benchmarks (i.e., ACDC, LiTS, MMWHS) with long-tail class distributions. As observed, the ratios of different label classes over three benchmarks are imbalanced.
  • Figure 2: Overview of the MONA framework including two stages: (1)GLCon is design to seek both augmented and mined views for instance discrimination $\mathcal{L}_{\text{inst}}$ in the global and local manners. Here the global instance discrimination is designed to exploit the correlations among views within the latent feature space, which is generated by the encoders. Meanwhile, local instance discrimination aims to leverage the correlations among views - specifically, local regions of the image - within the output feature space produced by the decoder (See Section \ref{['subsection:framework']}), (2) our proposed anatomical contrastive reconstruction fine-tuning (See Section \ref{['subsection:acr']}). Note that U and L denote unlabeled and labeled data.
  • Figure 3: Illustration of the contrastive loss. Intuitively, we actively sample a set of pixel-level anchor representations, pulling them closer to the class-averaged mean of representations within this class (positive keys), and pushing away from representations from other classes (negative keys).
  • Figure 4: Illustration of the equivariance loss.
  • Figure 5: Visualization of segmentation results on ACDC with 5% label ratio. As is shown, MONA consistently yields more accurate predictions and better boundary adherence compared to all other SSL methods. Different anatomical classes are shown in different colors (RV: ; Myo: ; LV: ).
  • ...and 6 more figures

Theorems & Definitions (7)

  • Definition A.1: Rademacher complexity
  • Theorem A.2: bartlett2002rademacher
  • Remark A.3
  • Definition A.4: Finite-basis function class
  • Lemma A.5
  • Proposition 1: mobahi2020self
  • Theorem A.6: Theorem 5, mobahi2020self