Table of Contents
Fetching ...

Context Matters: Query-aware Dynamic Long Sequence Modeling of Gigapixel Images

Zhengrui Guo, Qichen Sun, Jiabo Ma, Lishuang Feng, Jinzhuo Wang, Hao Chen

TL;DR

Querent proposes a query-aware, dynamic long-context modeling framework for gigapixel WSIs that retains the expressive power of full self-attention while dramatically reducing computation via region-level metadata and selective attention. The method introduces min-max region summarization, a region-importance estimator, and query-guided attention to achieve near-linear scaling for long sequences. Theoretical guarantees bound the approximation error to full self-attention, and extensive experiments across biomarker, mutation, subtyping, and survival tasks show state-of-the-art performance on over 10 WSI datasets. These contributions offer a practical, scalable solution for deep learning in computational pathology, with potential for broad clinical impact after further validation.

Abstract

Whole slide image (WSI) analysis presents significant computational challenges due to the massive number of patches in gigapixel images. While transformer architectures excel at modeling long-range correlations through self-attention, their quadratic computational complexity makes them impractical for computational pathology applications. Existing solutions like local-global or linear self-attention reduce computational costs but compromise the strong modeling capabilities of full self-attention. In this work, we propose Querent, i.e., the query-aware long contextual dynamic modeling framework, which achieves a theoretically bounded approximation of full self-attention while delivering practical efficiency. Our method adaptively predicts which surrounding regions are most relevant for each patch, enabling focused yet unrestricted attention computation only with potentially important contexts. By using efficient region-wise metadata computation and importance estimation, our approach dramatically reduces computational overhead while preserving global perception to model fine-grained patch correlations. Through comprehensive experiments on biomarker prediction, gene mutation prediction, cancer subtyping, and survival analysis across over 10 WSI datasets, our method demonstrates superior performance compared to the state-of-the-art approaches. Codes are available at https://github.com/dddavid4real/Querent.

Context Matters: Query-aware Dynamic Long Sequence Modeling of Gigapixel Images

TL;DR

Querent proposes a query-aware, dynamic long-context modeling framework for gigapixel WSIs that retains the expressive power of full self-attention while dramatically reducing computation via region-level metadata and selective attention. The method introduces min-max region summarization, a region-importance estimator, and query-guided attention to achieve near-linear scaling for long sequences. Theoretical guarantees bound the approximation error to full self-attention, and extensive experiments across biomarker, mutation, subtyping, and survival tasks show state-of-the-art performance on over 10 WSI datasets. These contributions offer a practical, scalable solution for deep learning in computational pathology, with potential for broad clinical impact after further validation.

Abstract

Whole slide image (WSI) analysis presents significant computational challenges due to the massive number of patches in gigapixel images. While transformer architectures excel at modeling long-range correlations through self-attention, their quadratic computational complexity makes them impractical for computational pathology applications. Existing solutions like local-global or linear self-attention reduce computational costs but compromise the strong modeling capabilities of full self-attention. In this work, we propose Querent, i.e., the query-aware long contextual dynamic modeling framework, which achieves a theoretically bounded approximation of full self-attention while delivering practical efficiency. Our method adaptively predicts which surrounding regions are most relevant for each patch, enabling focused yet unrestricted attention computation only with potentially important contexts. By using efficient region-wise metadata computation and importance estimation, our approach dramatically reduces computational overhead while preserving global perception to model fine-grained patch correlations. Through comprehensive experiments on biomarker prediction, gene mutation prediction, cancer subtyping, and survival analysis across over 10 WSI datasets, our method demonstrates superior performance compared to the state-of-the-art approaches. Codes are available at https://github.com/dddavid4real/Querent.

Paper Structure

This paper contains 41 sections, 4 theorems, 42 equations, 6 figures, 3 tables.

Key Result

Theorem 3.1

Let $\mathbf{A}$ be the query-aware attention matrix (Def. def:query-aware), and $\mathbf{B}$ be the full self-attention matrix (Def. def:self-attn). Assume attention scores decay exponentially with spatial distance: $\exp(-\alpha d(i,j))$ bounds the attention score decay for distance $d(i,j)$. For

Figures (6)

  • Figure 1: Illustration of context-dependent patch relationships in whole slide images. When a benign patch (A) interacts with cancerous patch C, it shows low correlation, while a cancerous patch (B) shows high correlation with patch C. This demonstrates how the same patch (C) can have fundamentally different relationships with other patches depending on the biological context.
  • Figure 2: Illustration of the proposed Querent framework, which models a WSI via four key steps: (1) region-level metadata summarization from the partitioned WSI, detailed in Fig. \ref{['fig:4']}, (2) identification of relevant regions for query patches through efficient importance scoring, (3) query-aware selective self-attention computation between query patch and patches in selected regions, and (4) feature aggregation with attentive pooling for final prediction. The framework enables dynamic modeling of long-range contextual relationships in gigapixel WSIs through efficient region relevance identification and query-aware selective attention computation.
  • Figure 3: Illustration of the region-level metadata summarization process. Each region from the WSI is represented by summary vectors computed from its constituent patches. These summary vectors capture the statistical characteristics (minimum and maximum values) across all patches within each region, providing an efficient representation for subsequent importance estimation.
  • Figure 4: Ablation on Querent using min, max, mean, and mean $\pm$std strategies compared to our min-max method on TCGA-LUAD TP53 gene mutation dataset (details in Appendix\ref{['appendix:ablation1']}).
  • Figure 5: Ablation on Querent using different region size $K$, with reported results on TCGA-LUAD for TP53 gene mutation prediction and UBC-OCEAN for ovarian cancer subtyping.
  • ...and 1 more figures

Theorems & Definitions (11)

  • Theorem 3.1: Query-Aware Attention Approximation
  • proof
  • Definition 2.1: Full Self-Attention Matrix
  • Definition 2.2: Query-Aware Attention Matrix
  • Definition 2.3: Region-Level Metadata
  • Lemma 2.4: Region Metadata Approximation
  • proof
  • Lemma 2.5: Ranking Stability
  • proof : Proof of Lemma \ref{['lemma:ranking']}
  • Theorem 2.6: Query-Aware Attention Approximation
  • ...and 1 more