SpaRED benchmark: Enhancing Gene Expression Prediction from Histology Images with Spatial Transcriptomics Completion

Gabriel Mejia; Daniela Ruiz; Paula Cárdenas; Leonardo Manrique; Daniela Vega; Pablo Arbeláez

SpaRED benchmark: Enhancing Gene Expression Prediction from Histology Images with Spatial Transcriptomics Completion

Gabriel Mejia, Daniela Ruiz, Paula Cárdenas, Leonardo Manrique, Daniela Vega, Pablo Arbeláez

TL;DR

A systematically curated and processed database collected from 26 public sources is presented, representing an 8.6-fold increase compared to previous works and a state-of-the-art transformer based completion technique for inferring missing gene expression is proposed, which significantly boosts the performance of transcriptomic profile predictions across all datasets.

Abstract

Spatial Transcriptomics is a novel technology that aligns histology images with spatially resolved gene expression profiles. Although groundbreaking, it struggles with gene capture yielding high corruption in acquired data. Given potential applications, recent efforts have focused on predicting transcriptomic profiles solely from histology images. However, differences in databases, preprocessing techniques, and training hyperparameters hinder a fair comparison between methods. To address these challenges, we present a systematically curated and processed database collected from 26 public sources, representing an 8.6-fold increase compared to previous works. Additionally, we propose a state-of-the-art transformer based completion technique for inferring missing gene expression, which significantly boosts the performance of transcriptomic profile predictions across all datasets. Altogether, our contributions constitute the most comprehensive benchmark of gene expression prediction from histology images to date and a stepping stone for future research on spatial transcriptomics.

SpaRED benchmark: Enhancing Gene Expression Prediction from Histology Images with Spatial Transcriptomics Completion

TL;DR

Abstract

Paper Structure (16 sections, 3 equations, 4 figures)

This paper contains 16 sections, 3 equations, 4 figures.

Introduction
Related Work
Integrated Databases
Completion strategies
Gene Expression Prediction Benchmarks
Spatially Resolved Expression Database
Original Datasets and Curation
Benchmark of Existing Gene Prediction Methods
Gene Completion with Transformers
Implementation Details:
Results and Discussion
Gene Completion Evaluation
Gene Prediction Benchmark
Conclusions
Acknowledgments.
...and 1 more sections

Figures (4)

Figure 1: (a) Organisms and tissues available in SpaRED, along with the number of spots available from each tissue. (b) Prediction Pearson Correlation Coefficient for each model across all the datasets in SpaRED. For each dataset, the state-of-the-art model that obtains the highest Pearson Correlation Coefficient is included.
Figure 2: Overview of our data completion framework using a transformer-based model.
Figure 3: Completion results: Violin plot displaying completion MSE scores for each method (SpaCKLE, Median and stLearn) across all datasets in SpaRED (upper left). Line plot displaying completion MSE for the median and SpaCKLE methods across different percentages of synthetically masked data (middle left). Qualitative results showing gene completion for increasing synthetic masking percentages (row 1) with the median method (row 2) and SpaCKLE (row 3).
Figure 4: (a) Violin plot: normalized prediction MSE of each model across all datasets within SpaRED, with normalization done against the best MSE obtained on each dataset. The mean and standard deviation of the methods are included at the top of each violin. Pie chart: percentage of datasets within SpaRED for which each model achieves the best prediction MSE. (b) Mean normalized prediction MSE against the number of trainable parameters for each model.

SpaRED benchmark: Enhancing Gene Expression Prediction from Histology Images with Spatial Transcriptomics Completion

TL;DR

Abstract

SpaRED benchmark: Enhancing Gene Expression Prediction from Histology Images with Spatial Transcriptomics Completion

Authors

TL;DR

Abstract

Table of Contents

Figures (4)