FALFormer: Feature-aware Landmarks self-attention for Whole-slide Image Classification
Doanh C. Bui, Trinh Thi Le Vuong, Jin Tae Kwak
TL;DR
FALFormer tackles the inefficiency of MIL-based WSI classification by processing the entire slide with a Transformer in a Nyström self-attention framework. It introduces feature-aware landmarks (FALSA) via K-means clustering to produce high-quality landmarks and approximate global attention with $\mathcal{O}(N)$ complexity, enabling full-patch interaction across WSIs. Across CAMELYON16 and TCGA-BRCA, FALFormer (especially with CTransPath features) achieves state-of-the-art accuracy, F1, and AUC, and ablations confirm the effectiveness of FALSA. The approach offers a practical balance between accuracy and computation, with potential to improve diagnosis and prognosis in digital pathology by leveraging comprehensive patch relationships.
Abstract
Slide-level classification for whole-slide images (WSIs) has been widely recognized as a crucial problem in digital and computational pathology. Current approaches commonly consider WSIs as a bag of cropped patches and process them via multiple instance learning due to the large number of patches, which cannot fully explore the relationship among patches; in other words, the global information cannot be fully incorporated into decision making. Herein, we propose an efficient and effective slide-level classification model, named as FALFormer, that can process a WSI as a whole so as to fully exploit the relationship among the entire patches and to improve the classification performance. FALFormer is built based upon Transformers and self-attention mechanism. To lessen the computational burden of the original self-attention mechanism and to process the entire patches together in a WSI, FALFormer employs Nyström self-attention which approximates the computation by using a smaller number of tokens or landmarks. For effective learning, FALFormer introduces feature-aware landmarks to enhance the representation power of the landmarks and the quality of the approximation. We systematically evaluate the performance of FALFormer using two public datasets, including CAMELYON16 and TCGA-BRCA. The experimental results demonstrate that FALFormer achieves superior performance on both datasets, outperforming the state-of-the-art methods for the slide-level classification. This suggests that FALFormer can facilitate an accurate and precise analysis of WSIs, potentially leading to improved diagnosis and prognosis on WSIs.
