Table of Contents
Fetching ...

Transcriptomics-guided Slide Representation Learning in Computational Pathology

Guillaume Jaume, Lukas Oldenburg, Anurag Vaidya, Richard J. Chen, Drew F. K. Williamson, Thomas Peeters, Andrew H. Song, Faisal Mahmood

TL;DR

This work tackles the problem of learning slide-level representations from giga-pixel whole-slide images by leveraging transcriptomic data. It introduces Tangle, a multimodal pre-training framework that jointly learns a slide encoder and a gene-expression encoder through symmetric contrastive learning, with optional reconstruction and intra-modality objectives. Trained on cross-species data (rat TG-GATEs and human TCGA samples) across liver, breast, and lung, Tangle achieves substantial improvements in few-shot classification, prototype-based tasks, and slide retrieval compared to MIL and patch-based SSL baselines, while offering interpretability via attention maps and gene-level attributions. The results underscore the value of aligning histology with molecular profiles to produce task-agnostic, biologically meaningful slide representations, with potential impact on preclinical toxicology and cancer pathology workflows.

Abstract

Self-supervised learning (SSL) has been successful in building patch embeddings of small histology images (e.g., 224x224 pixels), but scaling these models to learn slide embeddings from the entirety of giga-pixel whole-slide images (WSIs) remains challenging. Here, we leverage complementary information from gene expression profiles to guide slide representation learning using multimodal pre-training. Expression profiles constitute highly detailed molecular descriptions of a tissue that we hypothesize offer a strong task-agnostic training signal for learning slide embeddings. Our slide and expression (S+E) pre-training strategy, called Tangle, employs modality-specific encoders, the outputs of which are aligned via contrastive learning. Tangle was pre-trained on samples from three different organs: liver (n=6,597 S+E pairs), breast (n=1,020), and lung (n=1,012) from two different species (Homo sapiens and Rattus norvegicus). Across three independent test datasets consisting of 1,265 breast WSIs, 1,946 lung WSIs, and 4,584 liver WSIs, Tangle shows significantly better few-shot performance compared to supervised and SSL baselines. When assessed using prototype-based classification and slide retrieval, Tangle also shows a substantial performance improvement over all baselines. Code available at https://github.com/mahmoodlab/TANGLE.

Transcriptomics-guided Slide Representation Learning in Computational Pathology

TL;DR

This work tackles the problem of learning slide-level representations from giga-pixel whole-slide images by leveraging transcriptomic data. It introduces Tangle, a multimodal pre-training framework that jointly learns a slide encoder and a gene-expression encoder through symmetric contrastive learning, with optional reconstruction and intra-modality objectives. Trained on cross-species data (rat TG-GATEs and human TCGA samples) across liver, breast, and lung, Tangle achieves substantial improvements in few-shot classification, prototype-based tasks, and slide retrieval compared to MIL and patch-based SSL baselines, while offering interpretability via attention maps and gene-level attributions. The results underscore the value of aligning histology with molecular profiles to produce task-agnostic, biologically meaningful slide representations, with potential impact on preclinical toxicology and cancer pathology workflows.

Abstract

Self-supervised learning (SSL) has been successful in building patch embeddings of small histology images (e.g., 224x224 pixels), but scaling these models to learn slide embeddings from the entirety of giga-pixel whole-slide images (WSIs) remains challenging. Here, we leverage complementary information from gene expression profiles to guide slide representation learning using multimodal pre-training. Expression profiles constitute highly detailed molecular descriptions of a tissue that we hypothesize offer a strong task-agnostic training signal for learning slide embeddings. Our slide and expression (S+E) pre-training strategy, called Tangle, employs modality-specific encoders, the outputs of which are aligned via contrastive learning. Tangle was pre-trained on samples from three different organs: liver (n=6,597 S+E pairs), breast (n=1,020), and lung (n=1,012) from two different species (Homo sapiens and Rattus norvegicus). Across three independent test datasets consisting of 1,265 breast WSIs, 1,946 lung WSIs, and 4,584 liver WSIs, Tangle shows significantly better few-shot performance compared to supervised and SSL baselines. When assessed using prototype-based classification and slide retrieval, Tangle also shows a substantial performance improvement over all baselines. Code available at https://github.com/mahmoodlab/TANGLE.
Paper Structure (20 sections, 4 equations, 11 figures, 7 tables)

This paper contains 20 sections, 4 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Few-shot performance.$\textsc{Tangle}$ linear probing performance compared to multiple instance learning (ABMIL) and intra-modality slide SSL ($\textsc{Intra}$). $\textsc{Tangle}$ uses gene expression (E) to guide slide pre-training (S) using multimodal contrastive learning (S+E). Results on independent cohorts for BRCA subtyping (human breast, n=1,265 WSIs), NSCLC subtyping (human lung, n=1,946 WSIs), and TG-GATEs lesion classification (rat liver, n=4,584 WSIs). k: number of training samples per class.
  • Figure 1: Ablation study on TG-GATES.a. Ablation of the (S+E) loss of $\textsc{Tangle}$. We compare a symmetric contrastive loss with its non-symmetric counterpart, an L1 loss, and a Mean Squared Error loss. b. Combining $\textsc{Tangle}$ loss with $\textsc{Tangle}$-Rec and $\textsc{Intra}$. c.$\textsc{Intra}$ loss ablation using the average patch embedding, a random other view based on a different patch set, or a combination of both.
  • Figure 2: Overview of $\textsc{Tangle}$ for (S+E) pre-training. An input histology slide is tessellated into patches and encoded using a pre-trained vision encoder. The resulting patch embeddings are passed to an ABMIL module to derive a slide embedding. The corresponding gene expression data are encoded using an MLP. A symmetric contrastive objective $\mathcal{L}_{symCL}$ learns to align embeddings from both modalities. During inference, a query slide is encoded into a slide embedding by the trained pooling module to be used for downstream tasks.
  • Figure 2: Model ablation on TG-GATEs.$\textsc{Tangle}$ training when replacing the ABMIL backbone by TransMIL.
  • Figure 3: Downstream tasks. We test $\textsc{Tangle}$ and baselines on (1) few-shot,(2) prototype-based classification, and (3) slide retrieval.
  • ...and 6 more figures