Table of Contents
Fetching ...

Background Matters: A Cross-view Bidirectional Modeling Framework for Semi-supervised Medical Image Segmentation

Luyang Cao, Jianwei Li, Yinghuan Shi

TL;DR

This work tackles the limitation of foreground-centric learning in semi-supervised medical image segmentation by introducing Cross-view Bidirectional Modeling (CVBM), which actively models background regions as auxiliary signals to bolster foreground predictions. The framework employs a teacher–student paradigm with a shared encoder and dual decoders, a mixing layer, and a bidirectional consistency loss that enforces alignment between foreground and background predictions. The authors formalize auxiliary background labels, develop a region-wide supervision strategy, and prove, both empirically and theoretically, that background cues reduce uncertainty and enhance segmentation accuracy across four benchmarks (LA, NIH-Pancreas, ACDC, HRF). CVBM achieves state-of-the-art results, even surpassing fully supervised baselines on the Pancreas dataset with only 20% labeled data, while incurring no extra inference cost. This approach offers a practical boost for label-efficient medical image analysis and suggests broader applicability in cross-view learning and active labeling strategies.

Abstract

Semi-supervised medical image segmentation (SSMIS) leverages unlabeled data to reduce reliance on manually annotated images. However, current SOTA approaches predominantly focus on foreground-oriented modeling (i.e., segmenting only the foreground region) and have largely overlooked the potential benefits of explicitly modeling the background region. Our study theoretically and empirically demonstrates that highly certain predictions in background modeling enhance the confidence of corresponding foreground modeling. Building on this insight, we propose the Cross-view Bidirectional Modeling (CVBM) framework, which introduces a novel perspective by incorporating background modeling to improve foreground modeling performance. Within CVBM, background modeling serves as an auxiliary perspective, providing complementary supervisory signals to enhance the confidence of the foreground model. Additionally, CVBM introduces an innovative bidirectional consistency mechanism, which ensures mutual alignment between foreground predictions and background-guided predictions. Extensive experiments demonstrate that our approach achieves SOTA performance on the LA, Pancreas, ACDC, and HRF datasets. Notably, on the Pancreas dataset, CVBM outperforms fully supervised methods (i.e., DSC: 84.57% vs. 83.89%) while utilizing only 20% of the labeled data. Our code is publicly available at https://github.com/caoluyang0830/CVBM.git.

Background Matters: A Cross-view Bidirectional Modeling Framework for Semi-supervised Medical Image Segmentation

TL;DR

This work tackles the limitation of foreground-centric learning in semi-supervised medical image segmentation by introducing Cross-view Bidirectional Modeling (CVBM), which actively models background regions as auxiliary signals to bolster foreground predictions. The framework employs a teacher–student paradigm with a shared encoder and dual decoders, a mixing layer, and a bidirectional consistency loss that enforces alignment between foreground and background predictions. The authors formalize auxiliary background labels, develop a region-wide supervision strategy, and prove, both empirically and theoretically, that background cues reduce uncertainty and enhance segmentation accuracy across four benchmarks (LA, NIH-Pancreas, ACDC, HRF). CVBM achieves state-of-the-art results, even surpassing fully supervised baselines on the Pancreas dataset with only 20% labeled data, while incurring no extra inference cost. This approach offers a practical boost for label-efficient medical image analysis and suggests broader applicability in cross-view learning and active labeling strategies.

Abstract

Semi-supervised medical image segmentation (SSMIS) leverages unlabeled data to reduce reliance on manually annotated images. However, current SOTA approaches predominantly focus on foreground-oriented modeling (i.e., segmenting only the foreground region) and have largely overlooked the potential benefits of explicitly modeling the background region. Our study theoretically and empirically demonstrates that highly certain predictions in background modeling enhance the confidence of corresponding foreground modeling. Building on this insight, we propose the Cross-view Bidirectional Modeling (CVBM) framework, which introduces a novel perspective by incorporating background modeling to improve foreground modeling performance. Within CVBM, background modeling serves as an auxiliary perspective, providing complementary supervisory signals to enhance the confidence of the foreground model. Additionally, CVBM introduces an innovative bidirectional consistency mechanism, which ensures mutual alignment between foreground predictions and background-guided predictions. Extensive experiments demonstrate that our approach achieves SOTA performance on the LA, Pancreas, ACDC, and HRF datasets. Notably, on the Pancreas dataset, CVBM outperforms fully supervised methods (i.e., DSC: 84.57% vs. 83.89%) while utilizing only 20% of the labeled data. Our code is publicly available at https://github.com/caoluyang0830/CVBM.git.

Paper Structure

This paper contains 40 sections, 35 equations, 16 figures, 10 tables, 1 algorithm.

Figures (16)

  • Figure 1: The motivation of proposed approach. In some cases, background modeling exhibits higher predictive confidence compared to foreground modeling. The upper panel illustrates the conceptual definitions of foreground and background modeling, while the lower panel depicts the predictions from each modeling scheme. Both foreground and background models were trained utilizing VNet milletariVNetFullyConvolutional2016 on the LA dataset.
  • Figure 2: Overview of our proposed method. Model in gray represent stop gradient operations.
  • Figure 3: Background label settings. The inversion operation transforms binary label representations, converting background label values from 0 to 1 and foreground label values from 1 to 0. For single-target background labels, this operation is applied directly. For multi-target background labels, one-hot encoding is performed prior to inversion.
  • Figure 4: Pre-training process of teacher model. For the training process of our teacher network, only labeled data are utilized for pre-training. The network processes cutmix inputs (${{{{X}}}^{a}}$, ${{{{X}}}^{b}}$) and performs three core tasks: foreground modeling, background modeling and mixing, generating the respective predictions $Q_\text{fg}^a$ and $Q_\text{fg}^b$, $Q_\text{bg}^a$ and $Q_\text{bg}^b$, $Q_\text{M}^a$ and $Q_\text{M}^b$. Our optimization involves minimizing the foreground segmentation loss ($\mathcal{L}_\text{fg}$), the background segmentation loss ($\mathcal{L}_\text{bg}$) and the mixed prediction loss ($\mathcal{L}_\text{M}$).
  • Figure 5: Cut-mix process of labeled data. The enhanced images exchange foreground and background regions. The size of the zero-valued region in $\mathcal{M}$ is $\beta w \times \beta h \times \beta d$.
  • ...and 11 more figures