Rotation-Agnostic Image Representation Learning for Digital Pathology
Saghir Alfasly, Abubakr Shafique, Peyman Nejat, Jibran Khan, Areej Alsaafin, Ghazal Alabtah, H. R. Tizhoosh
TL;DR
This work tackles the scalability and reliability challenges of digital pathology by pairing a fast patch selection strategy (FPS) with a lightweight, histopathology-tuned Vision Transformer (PathDino) and a rotation-agnostic self-supervised learning scheme (HistoRotate). The methods are validated across a broad suite of 11–12 datasets, showing that PathDino-512 delivers strong WSI- and patch-level retrieval performance while FPS reduces computational cost and maintains diagnostic fidelity. The combination yields competitive or superior results to state-of-the-art histopathology transformers, with notable gains in patch-level majority-vote performance on TCGA-scale pretraining. Overall, the framework offers a practical path to scalable, robust digital pathology analysis with reduced overfitting and lower resource demands.
Abstract
This paper addresses complex challenges in histopathological image analysis through three key contributions. Firstly, it introduces a fast patch selection method, FPS, for whole-slide image (WSI) analysis, significantly reducing computational cost while maintaining accuracy. Secondly, it presents PathDino, a lightweight histopathology feature extractor with a minimal configuration of five Transformer blocks and only 9 million parameters, markedly fewer than alternatives. Thirdly, it introduces a rotation-agnostic representation learning paradigm using self-supervised learning, effectively mitigating overfitting. We also show that our compact model outperforms existing state-of-the-art histopathology-specific vision transformers on 12 diverse datasets, including both internal datasets spanning four sites (breast, liver, skin, and colorectal) and seven public datasets (PANDA, CAMELYON16, BRACS, DigestPath, Kather, PanNuke, and WSSS4LUAD). Notably, even with a training dataset of 6 million histopathology patches from The Cancer Genome Atlas (TCGA), our approach demonstrates an average 8.5% improvement in patch-level majority vote performance. These contributions provide a robust framework for enhancing image analysis in digital pathology, rigorously validated through extensive evaluation. Project Page: https://kimialabmayo.github.io/PathDino-Page/
