Table of Contents
Fetching ...

Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder

Kun Wu, Zhiguo Jiang, Kunming Tang, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng

TL;DR

Pan-cancer histopathology analysis requires effective slide-level representations, which existing self-supervised methods often fail to capture due to a focus on patch-level features. We introduce PAMA, a position-aware masked autoencoder that learns WSI-level representations through a slide-level MIM proxy task, augmented by a position-aware cross-attention module (PACA), a kernel reorientation (KRO) strategy, and an anchor dropout (AD) mechanism. PAMA encodes WSIs as patch features with anchors and spatial cues (distance and polar angles), reconstructs masked tokens via an asymmetric decoder, and optimizes with a slide-level MSE loss; the X_class token supports downstream tasks. Across seven large-scale datasets, PAMA achieves superior pan-cancer pre-training, demonstrates strong in-domain and out-of-domain generalization, and enables effective semi-supervised WSI classification, highlighting its practical value for robust, data-efficient histopathology analysis.

Abstract

Large-scale pre-training models have promoted the development of histopathology image analysis. However, existing self-supervised methods for histopathology images primarily focus on learning patch features, while there is a notable gap in the availability of pre-training models specifically designed for WSI-level feature learning. In this paper, we propose a novel self-supervised learning framework for pan-cancer WSI-level representation pre-training with the designed position-aware masked autoencoder (PAMA). Meanwhile, we propose the position-aware cross-attention (PACA) module with a kernel reorientation (KRO) strategy and an anchor dropout (AD) mechanism. The KRO strategy can capture the complete semantic structure and eliminate ambiguity in WSIs, and the AD contributes to enhancing the robustness and generalization of the model. We evaluated our method on 7 large-scale datasets from multiple organs for pan-cancer classification tasks. The results have demonstrated the effectiveness and generalization of PAMA in discriminative WSI representation learning and pan-cancer WSI pre-training. The proposed method was also compared with 8 WSI analysis methods. The experimental results have indicated that our proposed PAMA is superior to the state-of-the-art methods. The code and checkpoints are available at https://github.com/WkEEn/PAMA.

Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder

TL;DR

Pan-cancer histopathology analysis requires effective slide-level representations, which existing self-supervised methods often fail to capture due to a focus on patch-level features. We introduce PAMA, a position-aware masked autoencoder that learns WSI-level representations through a slide-level MIM proxy task, augmented by a position-aware cross-attention module (PACA), a kernel reorientation (KRO) strategy, and an anchor dropout (AD) mechanism. PAMA encodes WSIs as patch features with anchors and spatial cues (distance and polar angles), reconstructs masked tokens via an asymmetric decoder, and optimizes with a slide-level MSE loss; the X_class token supports downstream tasks. Across seven large-scale datasets, PAMA achieves superior pan-cancer pre-training, demonstrates strong in-domain and out-of-domain generalization, and enables effective semi-supervised WSI classification, highlighting its practical value for robust, data-efficient histopathology analysis.

Abstract

Large-scale pre-training models have promoted the development of histopathology image analysis. However, existing self-supervised methods for histopathology images primarily focus on learning patch features, while there is a notable gap in the availability of pre-training models specifically designed for WSI-level feature learning. In this paper, we propose a novel self-supervised learning framework for pan-cancer WSI-level representation pre-training with the designed position-aware masked autoencoder (PAMA). Meanwhile, we propose the position-aware cross-attention (PACA) module with a kernel reorientation (KRO) strategy and an anchor dropout (AD) mechanism. The KRO strategy can capture the complete semantic structure and eliminate ambiguity in WSIs, and the AD contributes to enhancing the robustness and generalization of the model. We evaluated our method on 7 large-scale datasets from multiple organs for pan-cancer classification tasks. The results have demonstrated the effectiveness and generalization of PAMA in discriminative WSI representation learning and pan-cancer WSI pre-training. The proposed method was also compared with 8 WSI analysis methods. The experimental results have indicated that our proposed PAMA is superior to the state-of-the-art methods. The code and checkpoints are available at https://github.com/WkEEn/PAMA.
Paper Structure (27 sections, 4 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 27 sections, 4 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Framework of pan-cancer WSI pre-training and task-specific fine-tuning, where (I) is the data pre-processing in which the spatial and structural information are constructed for the WSI feature, (II) displays the pre-training stage based on reconstructing the slide representation on label-free multi-organ datasets, and (III) is the fine-tuning of the encoder on weakly-supervised task-specific data for practical inference.
  • Figure 2: Illustrations of each structure in PAMA, where (I) describes the workflow of WSI representation self-supervised learning with PAMA, including encoder, decoder, and slide representation reconstruction, (II) is the structure of the position-aware cross-attention (PACA) module which is the core of PAMA, in which the kernel reorientation (KRO) strategy is described in Algorithm \ref{['algorithm1']} and the detailed process of anchor dropout is described in section \ref{['AD']}.
  • Figure 3: The illustration of the proposed Kernel Reorientation (KRO) strategy, where we show the KRO process for an anchor and highlight it with shading effects for secretarial clarity.
  • Figure 4: Improvement on the long-tailed dataset, where (a) shows the categories distribution of the unbalanced Endometrium-3k dataset, (b), (c), and (d) exhibit the ROC curves of each category without pre-training, with pre-training on the single dataset, and with pre-training on multi-organ datasets using DINO patch features, respectively.
  • Figure 5: Performance of different pre-training conditions of PAMA and Prov-GigaPath on various downstream tasks.
  • ...and 4 more figures