Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning
Tim Lenz, Peter Neidlinger, Marta Ligero, Georg Wölflein, Marko van Treeck, Jakob Nikolas Kather
TL;DR
COBRA addresses the challenge of learning robust slide-level representations from pathology WSIs under weak supervision. It achieves this by a single-modality contrastive SSL that fuses tile embeddings from multiple foundation models via a Mamba-2 based encoder and a multi-head gated attention pooling, trained with an InfoNCE loss. The approach attains state-of-the-art results on CPTAC external validation with only 3048 TCGA WSIs used for pretraining and demonstrates compatibility with unseen FMs at inference, while benefiting from multi-magnification pretraining for data efficiency. COBRA’s interpretability is provided through tile-wise attention heatmaps without Grad-CAM, and its FM-agnostic design supports flexible integration of future patch FMs, making slide-level representations more accessible and generalizable in clinical contexts.
Abstract
Representation learning of pathology whole-slide images (WSIs) has primarily relied on weak supervision with Multiple Instance Learning (MIL). This approach leads to slide representations highly tailored to a specific clinical task. Self-supervised learning (SSL) has been successfully applied to train histopathology foundation models (FMs) for patch embedding generation. However, generating patient or slide level embeddings remains challenging. Existing approaches for slide representation learning extend the principles of SSL from patch level learning to entire slides by aligning different augmentations of the slide or by utilizing multimodal data. By integrating tile embeddings from multiple FMs, we propose a new single modality SSL method in feature space that generates useful slide representations. Our contrastive pretraining strategy, called COBRA, employs multiple FMs and an architecture based on Mamba-2. COBRA exceeds performance of state-of-the-art slide encoders on four different public Clinical Protemic Tumor Analysis Consortium (CPTAC) cohorts on average by at least +4.4% AUC, despite only being pretrained on 3048 WSIs from The Cancer Genome Atlas (TCGA). Additionally, COBRA is readily compatible at inference time with previously unseen feature extractors. Code available at https://github.com/KatherLab/COBRA.
