Table of Contents
Fetching ...

SemSim: Revisiting Weak-to-Strong Consistency from a Semantic Similarity Perspective for Semi-supervised Medical Image Segmentation

Shiao Xie, Hongyi Wang, Ziwei Niu, Hao Sun, Shuyi Ouyang, Yen-Wei Chen, Lanfen Lin

TL;DR

A novel semi-supervised framework based on FixMatch, named SemSim, powered by two appealing designs from semantic similarity perspective is proposed, which yields consistent improvements over the state-of-the-art methods across three public segmentation benchmarks.

Abstract

Semi-supervised learning (SSL) for medical image segmentation is a challenging yet highly practical task, which reduces reliance on large-scale labeled dataset by leveraging unlabeled samples. Among SSL techniques, the weak-to-strong consistency framework, popularized by FixMatch, has emerged as a state-of-the-art method in classification tasks. Notably, such a simple pipeline has also shown competitive performance in medical image segmentation. However, two key limitations still persist, impeding its efficient adaptation: (1) the neglect of contextual dependencies results in inconsistent predictions for similar semantic features, leading to incomplete object segmentation; (2) the lack of exploitation of semantic similarity between labeled and unlabeled data induces considerable class-distribution discrepancy. To address these limitations, we propose a novel semi-supervised framework based on FixMatch, named SemSim, powered by two appealing designs from semantic similarity perspective: (1) rectifying pixel-wise prediction by reasoning about the intra-image pair-wise affinity map, thus integrating contextual dependencies explicitly into the final prediction; (2) bridging labeled and unlabeled data via a feature querying mechanism for compact class representation learning, which fully considers cross-image anatomical similarities. As the reliable semantic similarity extraction depends on robust features, we further introduce an effective spatial-aware fusion module (SFM) to explore distinctive information from multiple scales. Extensive experiments show that SemSim yields consistent improvements over the state-of-the-art methods across three public segmentation benchmarks.

SemSim: Revisiting Weak-to-Strong Consistency from a Semantic Similarity Perspective for Semi-supervised Medical Image Segmentation

TL;DR

A novel semi-supervised framework based on FixMatch, named SemSim, powered by two appealing designs from semantic similarity perspective is proposed, which yields consistent improvements over the state-of-the-art methods across three public segmentation benchmarks.

Abstract

Semi-supervised learning (SSL) for medical image segmentation is a challenging yet highly practical task, which reduces reliance on large-scale labeled dataset by leveraging unlabeled samples. Among SSL techniques, the weak-to-strong consistency framework, popularized by FixMatch, has emerged as a state-of-the-art method in classification tasks. Notably, such a simple pipeline has also shown competitive performance in medical image segmentation. However, two key limitations still persist, impeding its efficient adaptation: (1) the neglect of contextual dependencies results in inconsistent predictions for similar semantic features, leading to incomplete object segmentation; (2) the lack of exploitation of semantic similarity between labeled and unlabeled data induces considerable class-distribution discrepancy. To address these limitations, we propose a novel semi-supervised framework based on FixMatch, named SemSim, powered by two appealing designs from semantic similarity perspective: (1) rectifying pixel-wise prediction by reasoning about the intra-image pair-wise affinity map, thus integrating contextual dependencies explicitly into the final prediction; (2) bridging labeled and unlabeled data via a feature querying mechanism for compact class representation learning, which fully considers cross-image anatomical similarities. As the reliable semantic similarity extraction depends on robust features, we further introduce an effective spatial-aware fusion module (SFM) to explore distinctive information from multiple scales. Extensive experiments show that SemSim yields consistent improvements over the state-of-the-art methods across three public segmentation benchmarks.

Paper Structure

This paper contains 31 sections, 19 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Comparison of state-of-the-art methods with SemSim on the ACDC dataset under different labeled ratios.
  • Figure 2: Comparison of (a) FixMatch with (b) SemSim. $S$ and $W$ are strong and weak augmentations. $y_l$ represents the label and $p^l$ is the prediction of the labeled data.
  • Figure 3: Left: Visualization of class activation maps generated by Grad-CAM grad for FixMatch and SemSim. (a), (b) and (c) represent features of class Myo, RV, and LV. Right: Kernel density estimations of voxels belonging to the Myo class in the ACDC dataset. FixMatch suffers from empirical distribution mismatch between labeled and unlabeled data, while SemSim effectively narrows the distribution gap.
  • Figure 4: (a) Overview of SemSim framework. (b) The predictions $p^{in}$, $p^{w}_1$ are based on intra-image semantic similarity. (c) The predictions $p^{cr}$, $p^{w}_2$ are based on cross-image semantic similarity.
  • Figure 5: Overview of spatial-aware fusion module. $H$, $W$ represent the height and width of the feature map.
  • ...and 6 more figures