Stronger is not better: Better Augmentations in Contrastive Learning for Medical Image Segmentation
Azeez Idris, Abdurahman Ali Mohammed, Samuel Fanijo
TL;DR
The paper investigates whether strong data augmentations, proven effective in natural image contrastive learning, translate to medical image segmentation. Using a SimCLR-based pretraining framework on the KVASIR-SEG dataset with a U-Net downstream model, it contrasts strong augmentations (e.g., crop+resize, color distortion, Gaussian blur) against basic augmentations (resize, rotate, horizontal flip). The results show that basic augmentations often outperform strong ones across Dice, IoU, F-score, recall, and precision, even with larger batch sizes and ImageNet-pretrained weights. The authors conclude that augmentations should be tailored to the medical dataset rather than borrowed from natural-image benchmarks, and suggest further exploration of other SimCLR components with theoretical analyses. This work highlights the importance of dataset-specific augmentation strategies for self-supervised learning in medical segmentation.
Abstract
Self-supervised contrastive learning is among the recent representation learning methods that have shown performance gains in several downstream tasks including semantic segmentation. This paper evaluates strong data augmentation, one of the most important components for self-supervised contrastive learning's improved performance. Strong data augmentation involves applying the composition of multiple augmentation techniques on images. Surprisingly, we find that the existing data augmentations do not always improve performance for semantic segmentation for medical images. We experiment with other augmentations that provide improved performance.
