Table of Contents
Fetching ...

Multi-level Asymmetric Contrastive Learning for Volumetric Medical Image Segmentation Pre-training

Shuang Zeng, Lei Zhu, Xinliang Zhang, Micky C Nnamdi, Wenqi Shi, J Ben Tamo, Qian Chen, Hangzhou He, Lujia Jin, Zifeng Tian, Qiushi Ren, Zhaoheng Xie, Yanye Lu

TL;DR

MACL tackles label-scarce medical image segmentation by a one-stage asymmetric contrastive pre-training that jointly optimizes an encoder and a partial decoder. It integrates multi-level learning across image-level, feature-level, and pixel-level representations through a dual-branch architecture and a combined loss $L_{MLC}=\lambda_1 L_g + \lambda_2 L_d + \lambda_3 L_{ER}$. Empirical results on eight datasets spanning CT and MRI demonstrate that MACL consistently surpasses eleven baselines and generalizes across backbones, enabling strong segmentation performance with limited annotations. The approach offers a scalable, label-efficient pre-training paradigm for volumetric medical segmentation with practical impact on clinical workflows, and code will be released for reproducibility.

Abstract

Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Firstly existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. Furthermore they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for volumetric medical image segmentation pre-training. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 8 medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. i.e. Our MACL achieves a superior performance with more precise predictions from visualization figures and 1.72%, 7.87%, 2.49% and 1.48% Dice higher than previous best results on ACDC, MMWHS, HVSMR and CHAOS with 10% labeled data, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones. Our code will be released at https://github.com/stevezs315/MACL.

Multi-level Asymmetric Contrastive Learning for Volumetric Medical Image Segmentation Pre-training

TL;DR

MACL tackles label-scarce medical image segmentation by a one-stage asymmetric contrastive pre-training that jointly optimizes an encoder and a partial decoder. It integrates multi-level learning across image-level, feature-level, and pixel-level representations through a dual-branch architecture and a combined loss . Empirical results on eight datasets spanning CT and MRI demonstrate that MACL consistently surpasses eleven baselines and generalizes across backbones, enabling strong segmentation performance with limited annotations. The approach offers a scalable, label-efficient pre-training paradigm for volumetric medical segmentation with practical impact on clinical workflows, and code will be released for reproducibility.

Abstract

Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Firstly existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. Furthermore they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for volumetric medical image segmentation pre-training. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 8 medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. i.e. Our MACL achieves a superior performance with more precise predictions from visualization figures and 1.72%, 7.87%, 2.49% and 1.48% Dice higher than previous best results on ACDC, MMWHS, HVSMR and CHAOS with 10% labeled data, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones. Our code will be released at https://github.com/stevezs315/MACL.
Paper Structure (20 sections, 13 equations, 10 figures, 5 tables)

This paper contains 20 sections, 13 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Comparison of different CL frameworks for medical images: (a) Common CL frameworks used for medical image segmentation are symmetric and similar to SimCLR with a global contrastive loss $\mathcal{L}_{g}$ for optimization. (b) Two-stage CL framework pre-trains the encoder with global contrastive loss $\mathcal{L}_{g}$ and decoder with local contrastive loss $\mathcal{L}_{l}$ in separate stage which ignores the collaboration between the encoder and decoder. (c) Our proposed MACL framework is asymmetric with an additional partial decoder for pre-training and integrates multi-level CL strategy including feature-level equivariant regularization, image-level and pixel-level CL to get better initialization of both encoder and decoder for downstream segmentation tasks.
  • Figure 2: Overview of our proposed MACL framework. The input image is propagated to two branches: the dominant branch and auxiliary branch. In the dominant branch, after augmentation and downsampling, the input image is fitted into the encoder to get feature-level representation, and then propagated to the decoder and projector to get image-level and pixel-level projections; while in the auxiliary branch, it is the same process but no decoder will be used. The feature-level representations from two branches will be used in equivariant regularization loss, while image-level and pixel-level projections will be used in global and dense contrastive loss, respectively to achieve multi-level contrastive learning.
  • Figure 3: Visualization of multi-organ segmentation results on ACDC and MMWHS datasets. Our proposed MACL achieves a superior performance with a higher MIoU and more precise predictions of substructures across other 11 methods.
  • Figure 4: Visualization of multi-organ segmentation results on CHAOS and HVSMR datasets. Our proposed MACL achieves a superior performance with a higher MIoU and more precise predictions of substructures across other 11 methods.
  • Figure 5: Visualization of ROI-based segmentation results on Spleen, Heart and ISIC datasets.
  • ...and 5 more figures