Table of Contents
Fetching ...

Self-Supervision Closes the Gap Between Weak and Strong Supervision in Histology

Olivier Dehaene, Axel Camara, Olivier Moindrot, Axel de Lavergne, Pierre Courtiol

TL;DR

This work tackles weak supervision in histology by replacing ImageNet pretraining with in-domain self-supervised learning using MoCo v2 on unlabeled histology tiles. The authors integrate the resulting tile encoder into MIL-based weakly-supervised pipelines, achieving substantial performance gains on Camelyon16 and CMS classification on TCGA-COAD, and reducing cross-fold variability. They also demonstrate that the learned embeddings partition histology into biologically meaningful tissue structures and that the feature extractor can transfer across datasets, approaching the performance of strongly-supervised models on at least some tasks. The results argue for a universal, self-supervised histology feature extractor as a drop-in replacement for existing approaches, with practical implications for scalable analysis of whole-slide images.

Abstract

One of the biggest challenges for applying machine learning to histopathology is weak supervision: whole-slide images have billions of pixels yet often only one global label. The state of the art therefore relies on strongly-supervised model training using additional local annotations from domain experts. However, in the absence of detailed annotations, most weakly-supervised approaches depend on a frozen feature extractor pre-trained on ImageNet. We identify this as a key weakness and propose to train an in-domain feature extractor on histology images using MoCo v2, a recent self-supervised learning algorithm. Experimental results on Camelyon16 and TCGA show that the proposed extractor greatly outperforms its ImageNet counterpart. In particular, our results improve the weakly-supervised state of the art on Camelyon16 from 91.4% to 98.7% AUC, thereby closing the gap with strongly-supervised models that reach 99.3% AUC. Through these experiments, we demonstrate that feature extractors trained via self-supervised learning can act as drop-in replacements to significantly improve existing machine learning techniques in histology. Lastly, we show that the learned embedding space exhibits biologically meaningful separation of tissue structures.

Self-Supervision Closes the Gap Between Weak and Strong Supervision in Histology

TL;DR

This work tackles weak supervision in histology by replacing ImageNet pretraining with in-domain self-supervised learning using MoCo v2 on unlabeled histology tiles. The authors integrate the resulting tile encoder into MIL-based weakly-supervised pipelines, achieving substantial performance gains on Camelyon16 and CMS classification on TCGA-COAD, and reducing cross-fold variability. They also demonstrate that the learned embeddings partition histology into biologically meaningful tissue structures and that the feature extractor can transfer across datasets, approaching the performance of strongly-supervised models on at least some tasks. The results argue for a universal, self-supervised histology feature extractor as a drop-in replacement for existing approaches, with practical implications for scalable analysis of whole-slide images.

Abstract

One of the biggest challenges for applying machine learning to histopathology is weak supervision: whole-slide images have billions of pixels yet often only one global label. The state of the art therefore relies on strongly-supervised model training using additional local annotations from domain experts. However, in the absence of detailed annotations, most weakly-supervised approaches depend on a frozen feature extractor pre-trained on ImageNet. We identify this as a key weakness and propose to train an in-domain feature extractor on histology images using MoCo v2, a recent self-supervised learning algorithm. Experimental results on Camelyon16 and TCGA show that the proposed extractor greatly outperforms its ImageNet counterpart. In particular, our results improve the weakly-supervised state of the art on Camelyon16 from 91.4% to 98.7% AUC, thereby closing the gap with strongly-supervised models that reach 99.3% AUC. Through these experiments, we demonstrate that feature extractors trained via self-supervised learning can act as drop-in replacements to significantly improve existing machine learning techniques in histology. Lastly, we show that the learned embedding space exhibits biologically meaningful separation of tissue structures.

Paper Structure

This paper contains 34 sections, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Proposed pipeline. We train a ResNet encoder on histology tiles using MoCo v2, a self-supervised learning algorithm (a) and use the trained encoder as a feature extractor for multiple instance learning (MIL) (b). More details in Section \ref{['sec:methods']}.
  • Figure 1: The five most representative tiles of each of the 10 clusters found in the Camelyon16 tile embedding for MoCo v2. In orange, tumoral tissue. Please refer to Figure \ref{['fig:cam_test_001_moco_heatmap']} for an example of this cluster overlayed on a Camelyon16 slide.
  • Figure 2: (a) Tumor annotations (orange) displayed on a Camelyon16 test slide. (b) The best performing cluster on ImageNet features among 10 clusters obtains 69.4% AUC. (c) The best performing cluster on MoCoV2 features among 10 clusters obtains 95.1% AUC and matches almost perfectly the annotations, while being fully unsupervised.
  • Figure 2: The five most representative tiles of each of the 10 clusters found in the TCGA-COAD tile embedding for MoCo v2. In blue, mucosa with normal intestinal glands. In green, muscularis mucosae and submucosa. In red, tumoral tissue. Please refer to Figure \ref{['fig:coad_all_clusters']} for an example of these three clusters overlayed on a TCGA-COAD slide.
  • Figure 3: (a) A slide from TCGA-COAD with rough marker annotations. Blue: mucosa with normal intestinal glands, green: muscularis mucosae and submucosa, red: tumoral tissue (b) For each color, the best matching cluster on MoCo v2 features among 10 clusters is displayed.