Table of Contents
Fetching ...

Learning Relative Gene Expression Trends from Pathology Images in Spatial Transcriptomics

Kazuya Nishimura, Haruka Hirose, Ryoma Bise, Kaito Shiku, Yasuhiro Kojima

TL;DR

The paper tackles the high cost of spatial transcriptomics by predicting gene expression from pathology images, but notes that absolute expression estimates are fragile to batch effects and noise.It proposes STRank, a probabilistic learning-to-rank loss that models relative expression patterns across patches using pairwise (Binomial) and listwise (Multinomial) distributions, with corrections for sparse counts.Through synthetic experiments and seven real-spatial datasets (HEST-1k), STRank demonstrates improved robustness to batch effects and sparsity compared with traditional losses, though gains vary by dataset and setting.Overall, the work provides a principled framework for focusing on relative gene expression trends to improve reliability and transferability in spatial transcriptomics estimation.

Abstract

Gene expression estimation from pathology images has the potential to reduce the RNA sequencing cost. Point-wise loss functions have been widely used to minimize the discrepancy between predicted and absolute gene expression values. However, due to the complexity of the sequencing techniques and intrinsic variability across cells, the observed gene expression contains stochastic noise and batch effects, and estimating the absolute expression values accurately remains a significant challenge. To mitigate this, we propose a novel objective of learning relative expression patterns rather than absolute levels. We assume that the relative expression levels of genes exhibit consistent patterns across independent experiments, even when absolute expression values are affected by batch effects and stochastic noise in tissue samples. Based on the assumption, we model the relation and propose a novel loss function called STRank that is robust to noise and batch effects. Experiments using synthetic datasets and real datasets demonstrate the effectiveness of the proposed method. The code is available at https://github.com/naivete5656/STRank.

Learning Relative Gene Expression Trends from Pathology Images in Spatial Transcriptomics

TL;DR

The paper tackles the high cost of spatial transcriptomics by predicting gene expression from pathology images, but notes that absolute expression estimates are fragile to batch effects and noise.It proposes STRank, a probabilistic learning-to-rank loss that models relative expression patterns across patches using pairwise (Binomial) and listwise (Multinomial) distributions, with corrections for sparse counts.Through synthetic experiments and seven real-spatial datasets (HEST-1k), STRank demonstrates improved robustness to batch effects and sparsity compared with traditional losses, though gains vary by dataset and setting.Overall, the work provides a principled framework for focusing on relative gene expression trends to improve reliability and transferability in spatial transcriptomics estimation.

Abstract

Gene expression estimation from pathology images has the potential to reduce the RNA sequencing cost. Point-wise loss functions have been widely used to minimize the discrepancy between predicted and absolute gene expression values. However, due to the complexity of the sequencing techniques and intrinsic variability across cells, the observed gene expression contains stochastic noise and batch effects, and estimating the absolute expression values accurately remains a significant challenge. To mitigate this, we propose a novel objective of learning relative expression patterns rather than absolute levels. We assume that the relative expression levels of genes exhibit consistent patterns across independent experiments, even when absolute expression values are affected by batch effects and stochastic noise in tissue samples. Based on the assumption, we model the relation and propose a novel loss function called STRank that is robust to noise and batch effects. Experiments using synthetic datasets and real datasets demonstrate the effectiveness of the proposed method. The code is available at https://github.com/naivete5656/STRank.

Paper Structure

This paper contains 19 sections, 12 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: (a) Illustration of scaling bias due to batch effects, (b) stochastic noise, and (c) our hypothesis: learning relative expression trends. Even in the presence of batch effects and stochastic noise, the relative expression trends between patches are preserved.
  • Figure 2: Example of synthetic data for validating the batch effect. Colors indicate patients; the dashed line represents the mean function to be learned; and the dots show observations. (a) Uniform setting: Each patient's data is drawn from a uniform distribution. (b) Imbalanced setting: Observed data is skewed.
  • Figure 3:
  • Figure 5: Visualization of $\mu(x)$.
  • Figure 6: Visualization of $\mu(x)$.