Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction

Fredrik K. Gustafsson; Mattias Rantalainen

Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction

Fredrik K. Gustafsson, Mattias Rantalainen

TL;DR

This study provides recommendations on how deep regression models should be trained for WSI-based gene-expression prediction and concludes that training a single model to simultaneously regress all 20530 genes is a computationally efficient yet very strong baseline.

Abstract

Prediction of mRNA gene-expression profiles directly from routine whole-slide images (WSIs) using deep learning models could potentially offer cost-effective and widely accessible molecular phenotyping. While such WSI-based gene-expression prediction models have recently emerged within computational pathology, the high-dimensional nature of the corresponding regression problem offers numerous design choices which remain to be analyzed in detail. This study provides recommendations on how deep regression models should be trained for WSI-based gene-expression prediction. For example, we conclude that training a single model to simultaneously regress all 20530 genes is a computationally efficient yet very strong baseline.

Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction

TL;DR

Abstract

Paper Structure (7 sections, 1 equation, 12 figures, 9 tables)

This paper contains 7 sections, 1 equation, 12 figures, 9 tables.

Feature Extractors
ABMIL
Patch-Level
Model Architecture & Training
Prediction
Supplementary Figures
Supplementary Tables

Figures (12)

Figure 1: Model performance comparison of the four regression models across the four TCGA datasets, when utilizing both UNI and Resnet-IN as patch-level feature extractors. Top: mean Pearson correlation of all $N=20530$ genes. Middle: mean Pearson correlation of the top $1000$ genes with the highest regression accuracy. Bottom: The number of genes (out of the $N=20530$ total number of genes) regressed with a Pearson correlation of at least $0.4$. Higher is better for all three metrics. All results are mean$\pm$std (standard deviation) over the 5 cross-validation folds. Raw numerical results for this figure are provided in Table \ref{['tab:main_results_brca']} - \ref{['tab:main_results_blca']} in the supplementary material. The same model performance comparison but using Spearman correlation metrics instead of Pearson is also found in Figure \ref{['fig:main_results_spearman']}.
Figure 2: Performance comparison across the four TCGA datasets when multiple UNI - Direct - ABMIL models are used to output a full predicted gene-expression profile $\hat{y}(x) \in \mathbb{R}^N$ for each WSI $x$, using either sequential chunking or clustering to group the $N\!=\!20530$ genes into subsets. Same metrics as in Figure \ref{['fig:main_results']}. All results are the mean over the 5 cross-validation folds.
Figure 3: Performance comparison across the four TCGA datasets when progressively increasing the number of genes regressed per UNI - Direct - ABMIL model, using sequential chunking to group genes into subsets. All models are evaluated only on the subset of the first $800$ genes (to make this evaluation computationally manageable). Same metrics as in Figure \ref{['fig:main_results']}, except the middle row shows the mean Pearson correlation of the top $50$ genes with the highest regression accuracy. All results are the mean over the 5 cross-validation folds.
Figure 4: Overview of the three main evaluated models: Top:Direct - ABMIL. Middle:Direct - Patch-level. Bottom:Contrastive. All three models utilize the same initial WSI processing steps. First, the input WSI $x$ is tissue-segmented and divided into non-overlapping patches $\tilde{x}_i$ of size $256 \times 256$ using CLAM lu2021data. Next, a feature vector $p(\tilde{x}_i)$ is extracted for each patch, using a pretrained and frozen feature extractor (either UNI chen2024uni or Resnet-IN). The different models then process these patch-level feature vectors $p(\tilde{x}_i)$ further (see the Methods section for details), finally outputting a predicted gene-expression profile $\hat{y}(x) \in \mathbb{R}^N$ for all $N = 20530$ genes. In all three figures, blue marks the pretrained and frozen feature extractor, whereas green marks trainable model components.
Figure 5: Overview of the simple kNN baseline, which contains no trainable model parameters. For the input WSI $x$, a WSI-level feature vector $w(x)$ is directly computed as the mean over the patch-level feature vectors $p(\tilde{x}_i)$. kNN with $k = 100$ is then utilized to output a predicted gene-expression profile $\hat{y}(x) \in \mathbb{R}^N$.
...and 7 more figures

Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction

TL;DR

Abstract

Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (12)