Table of Contents
Fetching ...

Leveraging Spatial Context for Positive Pair Sampling in Histopathology Image Representation Learning

Willmer Rafell Quinones Robles, Sakonporn Noree, Jongwoo Kim, Young Sin Ko, Bryan Wong, Mun Yong Yi

TL;DR

This work proposes a spatial context-driven positive pair sampling strategy that enhances SSL by leveraging the morphological coherence of spatially adjacent patches within WSIs and provides a biologically meaningful enhancement for pretraining models in annotation-limited settings.

Abstract

Deep learning has shown strong potential in cancer classification from whole-slide images (WSIs), but the need for extensive expert annotations often limits its success. Annotation-free approaches, such as multiple instance learning (MIL) and self-supervised learning (SSL), have emerged as promising alternatives to traditional annotation-based methods. However, conventional SSL methods typically rely on synthetic data augmentations, which may fail to capture the spatial structure critical to histopathology. In this work, we propose a spatial context-driven positive pair sampling strategy that enhances SSL by leveraging the morphological coherence of spatially adjacent patches within WSIs. Our method is modular and compatible with established joint embedding SSL frameworks, including Barlow Twins, BYOL, VICReg, and DINOv2. We evaluate its effectiveness on both slide-level classification using MIL and patch-level linear probing. Experiments across four datasets demonstrate consistent performance improvements, with accuracy gains of 5\% to 10\% compared to standard augmentation-based sampling. These findings highlight the value of spatial context in improving representation learning for computational pathology and provide a biologically meaningful enhancement for pretraining models in annotation-limited settings. The code is available at https://anonymous.4open.science/r/contextual-pairs-E72F/.

Leveraging Spatial Context for Positive Pair Sampling in Histopathology Image Representation Learning

TL;DR

This work proposes a spatial context-driven positive pair sampling strategy that enhances SSL by leveraging the morphological coherence of spatially adjacent patches within WSIs and provides a biologically meaningful enhancement for pretraining models in annotation-limited settings.

Abstract

Deep learning has shown strong potential in cancer classification from whole-slide images (WSIs), but the need for extensive expert annotations often limits its success. Annotation-free approaches, such as multiple instance learning (MIL) and self-supervised learning (SSL), have emerged as promising alternatives to traditional annotation-based methods. However, conventional SSL methods typically rely on synthetic data augmentations, which may fail to capture the spatial structure critical to histopathology. In this work, we propose a spatial context-driven positive pair sampling strategy that enhances SSL by leveraging the morphological coherence of spatially adjacent patches within WSIs. Our method is modular and compatible with established joint embedding SSL frameworks, including Barlow Twins, BYOL, VICReg, and DINOv2. We evaluate its effectiveness on both slide-level classification using MIL and patch-level linear probing. Experiments across four datasets demonstrate consistent performance improvements, with accuracy gains of 5\% to 10\% compared to standard augmentation-based sampling. These findings highlight the value of spatial context in improving representation learning for computational pathology and provide a biologically meaningful enhancement for pretraining models in annotation-limited settings. The code is available at https://anonymous.4open.science/r/contextual-pairs-E72F/.

Paper Structure

This paper contains 14 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Standard SSL forms positives via augmentations of a single anchor patch. Our method additionally samples spatially adjacent patches within a predefined neighborhood to form contextual positive pairs (green).
  • Figure 2: Percentage of mismatches as a function of Chebyshev distance.
  • Figure 3: t-SNE of patch embeddings on the Private (stomach) dataset using VICReg (first 2 subplots) and DINOv2 (last 2 subplots). Contextual sampling (second and fourth subplots) yields higher NMI than standard training.
  • Figure 4: Accuracy gain (%) relative to the baseline (i.e., without contextual information). Results are averaged over five runs. Dark blue, orange, green, and purple lines correspond to Barlow Twins (BT), BYOL, VICReg, and DINOv2, respectively.
  • Figure 5: Effect of neighboring distance on slide-level classification accuracy measured via accuracy gain (%). Error regions represent the standard deviation. Dark blue, orange, green, and purple lines correspond to Barlow Twins (BT), BYOL, VICReg, and DINOv2, respectively.