Table of Contents
Fetching ...

PCLMix: Weakly Supervised Medical Image Segmentation via Pixel-Level Contrastive Learning and Dynamic Mix Augmentation

Yu Lei, Haolun Luo, Lituan Wang, Zhenwei Zhang, Lei Zhang

TL;DR

PCLMix tackles scribble-based weakly supervised medical image segmentation by jointly addressing the absence of structural priors and the discrete distribution of class features. It introduces a heterogeneous dual-decoder backbone and a four-step training pipeline that combines dynamic mix augmentation, uncertainty-guided pixel-level contrastive learning, and dual consistency regularization, formalized by the total loss $\mathcal{L}_{total}=\ell_{sup}+\lambda_{ctr}\ell_{ctr}+\lambda_{con}\ell_{con}$. A memory-queue based pixel-level contrastive objective and uncertainty maps guide anchor selection, while dual consistency enforces agreement across heterogeneous decoders and mixed augmentations. Evaluated on the ACDC dataset, PCLMix achieves competitive performance with state-of-the-art scribble-based methods and substantially narrows the gap to fully supervised approaches, with code released for reproducibility.

Abstract

In weakly supervised medical image segmentation, the absence of structural priors and the discreteness of class feature distribution present a challenge, i.e., how to accurately propagate supervision signals from local to global regions without excessively spreading them to other irrelevant regions? To address this, we propose a novel weakly supervised medical image segmentation framework named PCLMix, comprising dynamic mix augmentation, pixel-level contrastive learning, and consistency regularization strategies. Specifically, PCLMix is built upon a heterogeneous dual-decoder backbone, addressing the absence of structural priors through a strategy of dynamic mix augmentation during training. To handle the discrete distribution of class features, PCLMix incorporates pixel-level contrastive learning based on prediction uncertainty, effectively enhancing the model's ability to differentiate inter-class pixel differences and intra-class consistency. Furthermore, to reinforce segmentation consistency and robustness, PCLMix employs an auxiliary decoder for dual consistency regularization. In the inference phase, the auxiliary decoder will be dropped and no computation complexity is increased. Extensive experiments on the ACDC dataset demonstrate that PCLMix appropriately propagates local supervision signals to the global scale, further narrowing the gap between weakly supervised and fully supervised segmentation methods. Our code is available at https://github.com/Torpedo2648/PCLMix.

PCLMix: Weakly Supervised Medical Image Segmentation via Pixel-Level Contrastive Learning and Dynamic Mix Augmentation

TL;DR

PCLMix tackles scribble-based weakly supervised medical image segmentation by jointly addressing the absence of structural priors and the discrete distribution of class features. It introduces a heterogeneous dual-decoder backbone and a four-step training pipeline that combines dynamic mix augmentation, uncertainty-guided pixel-level contrastive learning, and dual consistency regularization, formalized by the total loss . A memory-queue based pixel-level contrastive objective and uncertainty maps guide anchor selection, while dual consistency enforces agreement across heterogeneous decoders and mixed augmentations. Evaluated on the ACDC dataset, PCLMix achieves competitive performance with state-of-the-art scribble-based methods and substantially narrows the gap to fully supervised approaches, with code released for reproducibility.

Abstract

In weakly supervised medical image segmentation, the absence of structural priors and the discreteness of class feature distribution present a challenge, i.e., how to accurately propagate supervision signals from local to global regions without excessively spreading them to other irrelevant regions? To address this, we propose a novel weakly supervised medical image segmentation framework named PCLMix, comprising dynamic mix augmentation, pixel-level contrastive learning, and consistency regularization strategies. Specifically, PCLMix is built upon a heterogeneous dual-decoder backbone, addressing the absence of structural priors through a strategy of dynamic mix augmentation during training. To handle the discrete distribution of class features, PCLMix incorporates pixel-level contrastive learning based on prediction uncertainty, effectively enhancing the model's ability to differentiate inter-class pixel differences and intra-class consistency. Furthermore, to reinforce segmentation consistency and robustness, PCLMix employs an auxiliary decoder for dual consistency regularization. In the inference phase, the auxiliary decoder will be dropped and no computation complexity is increased. Extensive experiments on the ACDC dataset demonstrate that PCLMix appropriately propagates local supervision signals to the global scale, further narrowing the gap between weakly supervised and fully supervised segmentation methods. Our code is available at https://github.com/Torpedo2648/PCLMix.
Paper Structure (28 sections, 13 equations, 4 figures, 2 tables)

This paper contains 28 sections, 13 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The core idea of uncertainty-guided pixel-level contrastive learning for scribble-supervised medical image segmentation.
  • Figure 2: The overview of PCLMix. PCLMix is built upon a heterogeneous dual-decoder backbone. It follows a four-step process: 1) data enters the network, generating pseudo labels and uncertainty maps. 2) Confirmed labels and scribbles construct a memory queue for contrastive learning. 3) The prediction segmentaions are shuffled and mixed with images and labels to create augmented data, 4) which is then fed back into the network. The entire training process will be driven by supervised loss, contrastive loss, and consistency loss.
  • Figure 3: Sensitivity analysis of transformer decoder weight ($\lambda_{t}$) and contrastive loss weight ($\lambda_{ctr}$).
  • Figure 4: Qualitative comparison of different methods on ACDC dataset. The selected subjects were the median cases with regard to the Dice scores of the results of fully-supervised segmentation.