Table of Contents
Fetching ...

Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation

Jamshid Hassanpour, Vinkle Srivastav, Didier Mutter, Nicolas Padoy

TL;DR

The paper addresses dimensional collapse in self-supervised contrastive learning for medical image segmentation, showing that MoCo v2 underutilizes the representation space due to high inter-image similarity. It introduces two strategies—local feature learning and feature decorrelation via ZCA whitening before the projector—to enrich backbone features. On AbdomenCT-1K pretraining and BTCV segmentation, the approach delivers an ~8% improvement in mean Dice Score for linear evaluation and superior fine-tuning performance, with ablations attributing gains to each component. This work demonstrates how tailoring SSL objectives to the characteristics of medical images can yield robust segmentations without heavily relying on decoder complexity.

Abstract

Self-supervised learning (SSL) approaches have achieved great success when the amount of labeled data is limited. Within SSL, models learn robust feature representations by solving pretext tasks. One such pretext task is contrastive learning, which involves forming pairs of similar and dissimilar input samples, guiding the model to distinguish between them. In this work, we investigate the application of contrastive learning to the domain of medical image analysis. Our findings reveal that MoCo v2, a state-of-the-art contrastive learning method, encounters dimensional collapse when applied to medical images. This is attributed to the high degree of inter-image similarity shared between the medical images. To address this, we propose two key contributions: local feature learning and feature decorrelation. Local feature learning improves the ability of the model to focus on the local regions of the image, while feature decorrelation removes the linear dependence among the features. Our experimental findings demonstrate that our contributions significantly enhance the model's performance in the downstream task of medical segmentation, both in the linear evaluation and full fine-tuning settings. This work illustrates the importance of effectively adapting SSL techniques to the characteristics of medical imaging tasks. The source code will be made publicly available at: https://github.com/CAMMA-public/med-moco

Overcoming Dimensional Collapse in Self-supervised Contrastive Learning for Medical Image Segmentation

TL;DR

The paper addresses dimensional collapse in self-supervised contrastive learning for medical image segmentation, showing that MoCo v2 underutilizes the representation space due to high inter-image similarity. It introduces two strategies—local feature learning and feature decorrelation via ZCA whitening before the projector—to enrich backbone features. On AbdomenCT-1K pretraining and BTCV segmentation, the approach delivers an ~8% improvement in mean Dice Score for linear evaluation and superior fine-tuning performance, with ablations attributing gains to each component. This work demonstrates how tailoring SSL objectives to the characteristics of medical images can yield robust segmentations without heavily relying on decoder complexity.

Abstract

Self-supervised learning (SSL) approaches have achieved great success when the amount of labeled data is limited. Within SSL, models learn robust feature representations by solving pretext tasks. One such pretext task is contrastive learning, which involves forming pairs of similar and dissimilar input samples, guiding the model to distinguish between them. In this work, we investigate the application of contrastive learning to the domain of medical image analysis. Our findings reveal that MoCo v2, a state-of-the-art contrastive learning method, encounters dimensional collapse when applied to medical images. This is attributed to the high degree of inter-image similarity shared between the medical images. To address this, we propose two key contributions: local feature learning and feature decorrelation. Local feature learning improves the ability of the model to focus on the local regions of the image, while feature decorrelation removes the linear dependence among the features. Our experimental findings demonstrate that our contributions significantly enhance the model's performance in the downstream task of medical segmentation, both in the linear evaluation and full fine-tuning settings. This work illustrates the importance of effectively adapting SSL techniques to the characteristics of medical imaging tasks. The source code will be made publicly available at: https://github.com/CAMMA-public/med-moco
Paper Structure (8 sections, 5 equations, 2 figures, 2 tables)

This paper contains 8 sections, 5 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Spectrum of singular values in different pre-training schemes. As shown, the singular values in the regular MoCo v2 pre-training are near zero and cause the dimensional collapse. Local feature learning and feature decorrelation improve the backbone features and thus increase the representation dimension.
  • Figure 2: The proposed architecture of the modified MoCo v2: The local loss is applied to the averaged feature maps from the first layer of the backbone, and ZCA whitening is done in the last layer of the backbone before the projector.