Table of Contents
Fetching ...

Learning from the Right Patches: A Two-Stage Wavelet-Driven Masked Autoencoder for Histopathology Representation Learning

Raneen Younis, Louay Hamdi, Lukas Chavez, Zahra Ahmadi

TL;DR

Whole-slide images pose computational and labeling challenges for histopathology. WISE-MAE addresses this by a two-stage, wavelet-guided patch selection that screens at low magnification to identify informative regions and then trains a masked autoencoder on high-resolution patches, with optional contrastive learning and MIL-based downstream classification. The approach yields competitive accuracy and AUC on NSCLC, RCC, and CAMELYON16 while using a lightweight ViT-Base backbone and reduced data requirements, and demonstrates robust cross-domain transfer. Overall, WISE-MAE enhances data efficiency, generalization, and robustness in deep learning for digital pathology by aligning patch sampling with tissue morphology and diagnostic workflows.

Abstract

Whole-slide images are central to digital pathology, yet their extreme size and scarce annotations make self-supervised learning essential. Masked Autoencoders (MAEs) with Vision Transformer backbones have recently shown strong potential for histopathology representation learning. However, conventional random patch sampling during MAE pretraining often includes irrelevant or noisy regions, limiting the model's ability to capture meaningful tissue patterns. In this paper, we present a lightweight and domain-adapted framework that brings structure and biological relevance into MAE-based learning through a wavelet-informed patch selection strategy. WISE-MAE applies a two-step coarse-to-fine process: wavelet-based screening at low magnification to locate structurally rich regions, followed by high-resolution extraction for detailed modeling. This approach mirrors the diagnostic workflow of pathologists and improves the quality of learned representations. Evaluations across multiple cancer datasets, including lung, renal, and colorectal tissues, show that WISE-MAE achieves competitive representation quality and downstream classification performance while maintaining efficiency under weak supervision.

Learning from the Right Patches: A Two-Stage Wavelet-Driven Masked Autoencoder for Histopathology Representation Learning

TL;DR

Whole-slide images pose computational and labeling challenges for histopathology. WISE-MAE addresses this by a two-stage, wavelet-guided patch selection that screens at low magnification to identify informative regions and then trains a masked autoencoder on high-resolution patches, with optional contrastive learning and MIL-based downstream classification. The approach yields competitive accuracy and AUC on NSCLC, RCC, and CAMELYON16 while using a lightweight ViT-Base backbone and reduced data requirements, and demonstrates robust cross-domain transfer. Overall, WISE-MAE enhances data efficiency, generalization, and robustness in deep learning for digital pathology by aligning patch sampling with tissue morphology and diagnostic workflows.

Abstract

Whole-slide images are central to digital pathology, yet their extreme size and scarce annotations make self-supervised learning essential. Masked Autoencoders (MAEs) with Vision Transformer backbones have recently shown strong potential for histopathology representation learning. However, conventional random patch sampling during MAE pretraining often includes irrelevant or noisy regions, limiting the model's ability to capture meaningful tissue patterns. In this paper, we present a lightweight and domain-adapted framework that brings structure and biological relevance into MAE-based learning through a wavelet-informed patch selection strategy. WISE-MAE applies a two-step coarse-to-fine process: wavelet-based screening at low magnification to locate structurally rich regions, followed by high-resolution extraction for detailed modeling. This approach mirrors the diagnostic workflow of pathologists and improves the quality of learned representations. Evaluations across multiple cancer datasets, including lung, renal, and colorectal tissues, show that WISE-MAE achieves competitive representation quality and downstream classification performance while maintaining efficiency under weak supervision.

Paper Structure

This paper contains 22 sections, 6 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Illustration of hierarchical analysis in computational pathology. Left: A pathologist examines a whole-slide image and zooms into regions of diagnostic interest, a process mimicked in computational models. Right: Multi-resolution representation of the same WSI across magnification levels. Lower layers correspond to higher magnification with finer tissue detail, reflecting the increasing structural information accessed through zooming.
  • Figure 2: Overview of the WISE-MAE framework. The process begins with Stage 1 patch sampling at $5$× magnification (low resolution), where wavelet energy is computed for each patch to assess morphological richness. The top-$k$% patches with the highest energy scores are selected. In Stage 2, these selected regions are revisited at $40$× magnification (high resolution) to extract fine-grained patches containing rich tissue detail. These high-resolution patches are then used to train a Masked Autoencoder (MAE) in a self-supervised fashion. The learned encoder representations are subsequently used in an MIL framework for downstream classification tasks.