Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation
Jamshid Hassanpour, Vinkle Srivastav, Didier Mutter, Nicolas Padoy
TL;DR
The paper addresses dimensional collapse in self-supervised contrastive learning for medical image segmentation, showing that MoCo v2 underutilizes the representation space due to high inter-image similarity. It introduces two strategies—local feature learning and feature decorrelation via ZCA whitening before the projector—to enrich backbone features. On AbdomenCT-1K pretraining and BTCV segmentation, the approach delivers an ~8% improvement in mean Dice Score for linear evaluation and superior fine-tuning performance, with ablations attributing gains to each component. This work demonstrates how tailoring SSL objectives to the characteristics of medical images can yield robust segmentations without heavily relying on decoder complexity.
Abstract
Self-supervised learning (SSL) approaches have achieved great success when the amount of labeled data is limited. Within SSL, models learn robust feature representations by solving pretext tasks. One such pretext task is contrastive learning, which involves forming pairs of similar and dissimilar input samples, guiding the model to distinguish between them. In this work, we investigate the application of contrastive learning to the domain of medical image analysis. Our findings reveal that MoCo v2, a state-of-the-art contrastive learning method, encounters dimensional collapse when applied to medical images. This is attributed to the high degree of inter-image similarity shared between the medical images. To address this, we propose two key contributions: local feature learning and feature decorrelation. Local feature learning improves the ability of the model to focus on the local regions of the image, while feature decorrelation removes the linear dependence among the features. Our experimental findings demonstrate that our contributions significantly enhance the model's performance in the downstream task of medical segmentation, both in the linear evaluation and full fine-tuning settings. This work illustrates the importance of effectively adapting SSL techniques to the characteristics of medical imaging tasks. The source code will be made publicly available at: https://github.com/CAMMA-public/med-moco
