Self-Supervision Closes the Gap Between Weak and Strong Supervision in Histology
Olivier Dehaene, Axel Camara, Olivier Moindrot, Axel de Lavergne, Pierre Courtiol
TL;DR
This work tackles weak supervision in histology by replacing ImageNet pretraining with in-domain self-supervised learning using MoCo v2 on unlabeled histology tiles. The authors integrate the resulting tile encoder into MIL-based weakly-supervised pipelines, achieving substantial performance gains on Camelyon16 and CMS classification on TCGA-COAD, and reducing cross-fold variability. They also demonstrate that the learned embeddings partition histology into biologically meaningful tissue structures and that the feature extractor can transfer across datasets, approaching the performance of strongly-supervised models on at least some tasks. The results argue for a universal, self-supervised histology feature extractor as a drop-in replacement for existing approaches, with practical implications for scalable analysis of whole-slide images.
Abstract
One of the biggest challenges for applying machine learning to histopathology is weak supervision: whole-slide images have billions of pixels yet often only one global label. The state of the art therefore relies on strongly-supervised model training using additional local annotations from domain experts. However, in the absence of detailed annotations, most weakly-supervised approaches depend on a frozen feature extractor pre-trained on ImageNet. We identify this as a key weakness and propose to train an in-domain feature extractor on histology images using MoCo v2, a recent self-supervised learning algorithm. Experimental results on Camelyon16 and TCGA show that the proposed extractor greatly outperforms its ImageNet counterpart. In particular, our results improve the weakly-supervised state of the art on Camelyon16 from 91.4% to 98.7% AUC, thereby closing the gap with strongly-supervised models that reach 99.3% AUC. Through these experiments, we demonstrate that feature extractors trained via self-supervised learning can act as drop-in replacements to significantly improve existing machine learning techniques in histology. Lastly, we show that the learned embedding space exhibits biologically meaningful separation of tissue structures.
