Rotation-Agnostic Image Representation Learning for Digital Pathology

Saghir Alfasly; Abubakr Shafique; Peyman Nejat; Jibran Khan; Areej Alsaafin; Ghazal Alabtah; H. R. Tizhoosh

Rotation-Agnostic Image Representation Learning for Digital Pathology

Saghir Alfasly, Abubakr Shafique, Peyman Nejat, Jibran Khan, Areej Alsaafin, Ghazal Alabtah, H. R. Tizhoosh

TL;DR

This work tackles the scalability and reliability challenges of digital pathology by pairing a fast patch selection strategy (FPS) with a lightweight, histopathology-tuned Vision Transformer (PathDino) and a rotation-agnostic self-supervised learning scheme (HistoRotate). The methods are validated across a broad suite of 11–12 datasets, showing that PathDino-512 delivers strong WSI- and patch-level retrieval performance while FPS reduces computational cost and maintains diagnostic fidelity. The combination yields competitive or superior results to state-of-the-art histopathology transformers, with notable gains in patch-level majority-vote performance on TCGA-scale pretraining. Overall, the framework offers a practical path to scalable, robust digital pathology analysis with reduced overfitting and lower resource demands.

Abstract

This paper addresses complex challenges in histopathological image analysis through three key contributions. Firstly, it introduces a fast patch selection method, FPS, for whole-slide image (WSI) analysis, significantly reducing computational cost while maintaining accuracy. Secondly, it presents PathDino, a lightweight histopathology feature extractor with a minimal configuration of five Transformer blocks and only 9 million parameters, markedly fewer than alternatives. Thirdly, it introduces a rotation-agnostic representation learning paradigm using self-supervised learning, effectively mitigating overfitting. We also show that our compact model outperforms existing state-of-the-art histopathology-specific vision transformers on 12 diverse datasets, including both internal datasets spanning four sites (breast, liver, skin, and colorectal) and seven public datasets (PANDA, CAMELYON16, BRACS, DigestPath, Kather, PanNuke, and WSSS4LUAD). Notably, even with a training dataset of 6 million histopathology patches from The Cancer Genome Atlas (TCGA), our approach demonstrates an average 8.5% improvement in patch-level majority vote performance. These contributions provide a robust framework for enhancing image analysis in digital pathology, rigorously validated through extensive evaluation. Project Page: https://kimialabmayo.github.io/PathDino-Page/

Rotation-Agnostic Image Representation Learning for Digital Pathology

TL;DR

Abstract

Paper Structure (14 sections, 11 equations, 10 figures, 18 tables)

This paper contains 14 sections, 11 equations, 10 figures, 18 tables.

Introduction
Related Work
Proposed Method
FPS: Fast Patch Selection
HistoRotate: Rotation-Agnostic Training
PathDino: A Histopathology-specific Vision Transformer
Experiment Setup
Experimental Results
FPS Effectiveness
FPS Efficiency
PathDino - WSI-Level Search
PathDino - Patch-Level Search
PathDino - Patch-level 5-Fold Cross-Validation
Conclusions

Figures (10)

Figure 1: HistoRotate. A $360^\circ$ rotation augmentation for training models on histopathology images. Unlike training on natural images where the rotation may change the context of the visual data, rotating a histopathology image improves the learning process for discriminative embedding learning.
Figure 2: The WSI Analysis Pipeline. (A) The fast patch selection method, FPS, selects a set of representative patches while preserving spatial distribution. (B) HistoRotate is a $360^\circ$ rotation augmentation for histopathology model training, enhancing learning without contextual information alteration. (C) PathDino is a compact histopathology Transformer with five small vision transformer blocks and $\approx$$9$ million parameters, significantly leaner than alternatives.
Figure 3: PathDino vs. its counterparts. Number of parameters (millions) vs. the patch-level retrieval with macro average $F$-$1$ score of majority vote (MV@5) on CAMELYON16 dataset. The bubble size represents the FLOPs.
Figure 4: Attention Visualization. When visualizing attention maps, our PathDino transformer outperforms HIPT-small and DinoSSLPath, despite being trained on a smaller dataset of $6$M TCGA patches. In contrast, DinoSSLPath and HIPT were trained on much larger datasets, with $19$ million and $104$ million TCGA patches, respectively.
Figure 5: Embedding variance analysis of three selected Transformer-based histopathological feature extractors with the output vector size of $384$ including HIPT, DinoSSLPath, and our PathDino on PANDA dataset DataPANDA.
...and 5 more figures

Rotation-Agnostic Image Representation Learning for Digital Pathology

TL;DR

Abstract

Rotation-Agnostic Image Representation Learning for Digital Pathology

Authors

TL;DR

Abstract

Table of Contents

Figures (10)