Table of Contents
Fetching ...

Adapting Self-Supervised Learning for Computational Pathology

Eric Zimmermann, Neil Tenenholtz, James Hall, George Shaikovski, Michal Zelechowski, Adam Casson, Fausto Milletari, Julian Viret, Eugene Vorontsov, Siqi Liu, Kristen Severson

TL;DR

This work investigates how self-supervised learning for computer vision, specifically DINOv2, can be adapted to computational pathology where gigapixel WSIs and fixed magnifications pose unique challenges. It introduces domain-informed modifications: extended-context translation to preserve morphology during augmentation, KDE-based regularization to replace KoLeo entropy and better handle repetitive tissue patterns, and magnification-aware position encodings (CSD and LPM) to respect known physical scales. Through large-scale pretraining on MSKCC WSIs and evaluation across seven tile-level tasks, the study shows that morphology-preserving augmentations and KDE regularization substantially improve representation quality, with mean performance gains on several in-domain and out-of-domain tasks. The findings underscore the importance of tailoring self-supervised objectives to pathology data and suggest avenues for stronger foundations models in computational pathology with improved cross-domain generalization.

Abstract

Self-supervised learning (SSL) has emerged as a key technique for training networks that can generalize well to diverse tasks without task-specific supervision. This property makes SSL desirable for computational pathology, the study of digitized images of tissues, as there are many target applications and often limited labeled training samples. However, SSL algorithms and models have been primarily developed in the field of natural images and whether their performance can be improved by adaptation to particular domains remains an open question. In this work, we present an investigation of modifications to SSL for pathology data, specifically focusing on the DINOv2 algorithm. We propose alternative augmentations, regularization functions, and position encodings motivated by the characteristics of pathology images. We evaluate the impact of these changes on several benchmarks to demonstrate the value of tailored approaches.

Adapting Self-Supervised Learning for Computational Pathology

TL;DR

This work investigates how self-supervised learning for computer vision, specifically DINOv2, can be adapted to computational pathology where gigapixel WSIs and fixed magnifications pose unique challenges. It introduces domain-informed modifications: extended-context translation to preserve morphology during augmentation, KDE-based regularization to replace KoLeo entropy and better handle repetitive tissue patterns, and magnification-aware position encodings (CSD and LPM) to respect known physical scales. Through large-scale pretraining on MSKCC WSIs and evaluation across seven tile-level tasks, the study shows that morphology-preserving augmentations and KDE regularization substantially improve representation quality, with mean performance gains on several in-domain and out-of-domain tasks. The findings underscore the importance of tailoring self-supervised objectives to pathology data and suggest avenues for stronger foundations models in computational pathology with improved cross-domain generalization.

Abstract

Self-supervised learning (SSL) has emerged as a key technique for training networks that can generalize well to diverse tasks without task-specific supervision. This property makes SSL desirable for computational pathology, the study of digitized images of tissues, as there are many target applications and often limited labeled training samples. However, SSL algorithms and models have been primarily developed in the field of natural images and whether their performance can be improved by adaptation to particular domains remains an open question. In this work, we present an investigation of modifications to SSL for pathology data, specifically focusing on the DINOv2 algorithm. We propose alternative augmentations, regularization functions, and position encodings motivated by the characteristics of pathology images. We evaluate the impact of these changes on several benchmarks to demonstrate the value of tailored approaches.
Paper Structure (12 sections, 4 equations, 3 figures, 1 table)

This paper contains 12 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Example of a computation pathology data pipeline for self-supervised learning. Tiles (center) that satisfy a tissue inclusion criteria are randomly sampled from a WSI (left) and perturbed using a set of augmentations sampled from a policy to build the self-supervised pre-text task (right). Augmentations include but are not limited to Gaussian blur, color jitter, grayscale, and crop-and-resize.
  • Figure 2: Illustration of crop-and-resize (top) and modified extended-context translation augmentation strategy (bottom). Crop-and-resize randomly selects a sub-region smaller than the target size and up-samples to meet the criterion. Extended-context translation extends the context window and randomly selects a sub-region that is approximately the target size.
  • Figure 3: Translation behavior is controlled with a smaller aspect ratio and adjusted scale range computed using ratio of patch area.