Multi-level Asymmetric Contrastive Learning for Volumetric Medical Image Segmentation Pre-training
Shuang Zeng, Lei Zhu, Xinliang Zhang, Micky C Nnamdi, Wenqi Shi, J Ben Tamo, Qian Chen, Hangzhou He, Lujia Jin, Zifeng Tian, Qiushi Ren, Zhaoheng Xie, Yanye Lu
TL;DR
MACL tackles label-scarce medical image segmentation by a one-stage asymmetric contrastive pre-training that jointly optimizes an encoder and a partial decoder. It integrates multi-level learning across image-level, feature-level, and pixel-level representations through a dual-branch architecture and a combined loss $L_{MLC}=\lambda_1 L_g + \lambda_2 L_d + \lambda_3 L_{ER}$. Empirical results on eight datasets spanning CT and MRI demonstrate that MACL consistently surpasses eleven baselines and generalizes across backbones, enabling strong segmentation performance with limited annotations. The approach offers a scalable, label-efficient pre-training paradigm for volumetric medical segmentation with practical impact on clinical workflows, and code will be released for reproducibility.
Abstract
Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Firstly existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. Furthermore they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for volumetric medical image segmentation pre-training. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 8 medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. i.e. Our MACL achieves a superior performance with more precise predictions from visualization figures and 1.72%, 7.87%, 2.49% and 1.48% Dice higher than previous best results on ACDC, MMWHS, HVSMR and CHAOS with 10% labeled data, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones. Our code will be released at https://github.com/stevezs315/MACL.
