Table of Contents
Fetching ...

M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images

Hongyi Wang, Xiuju Du, Jing Liu, Shuyi Ouyang, Yen-Wei Chen, Lanfen Lin

TL;DR

The paper tackles predicting spatial transcriptomics gene expressions directly from histopathology WSIs by reframing ST prediction as a many-to-one regression problem that leverages multi-scale pathology information. It introduces M2ORT, a Transformer-based encoder that decouples multi-scale feature extraction via Level-Dependent Patch Embedding (LDPE), Intra-Level Token Mixing (ITMM), and Inter-Level Channel Mixing (ICMM), followed by a regression head. Across three public ST datasets, M2ORT variants achieve state-of-the-art PCC and RMSE with fewer parameters and FLOPs than existing patch-level and slide-level baselines, demonstrating strong accuracy and efficiency. This approach enables cost-effective, scalable ST map prediction from readily available WSIs, with robust cross-dataset performance and interpretable multi-scale fusion.

Abstract

The advancement of Spatial Transcriptomics (ST) has facilitated the spatially-aware profiling of gene expressions based on histopathology images. Although ST data offers valuable insights into the micro-environment of tumors, its acquisition cost remains expensive. Therefore, directly predicting the ST expressions from digital pathology images is desired. Current methods usually adopt existing regression backbones for this task, which ignore the inherent multi-scale hierarchical data structure of digital pathology images. To address this limit, we propose M2ORT, a many-to-one regression Transformer that can accommodate the hierarchical structure of the pathology images through a decoupled multi-scale feature extractor. Different from traditional models that are trained with one-to-one image-label pairs, M2ORT accepts multiple pathology images of different magnifications at a time to jointly predict the gene expressions at their corresponding common ST spot, aiming at learning a many-to-one relationship through training. We have tested M2ORT on three public ST datasets and the experimental results show that M2ORT can achieve state-of-the-art performance with fewer parameters and floating-point operations (FLOPs). The code is available at: https://github.com/Dootmaan/M2ORT/.

M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images

TL;DR

The paper tackles predicting spatial transcriptomics gene expressions directly from histopathology WSIs by reframing ST prediction as a many-to-one regression problem that leverages multi-scale pathology information. It introduces M2ORT, a Transformer-based encoder that decouples multi-scale feature extraction via Level-Dependent Patch Embedding (LDPE), Intra-Level Token Mixing (ITMM), and Inter-Level Channel Mixing (ICMM), followed by a regression head. Across three public ST datasets, M2ORT variants achieve state-of-the-art PCC and RMSE with fewer parameters and FLOPs than existing patch-level and slide-level baselines, demonstrating strong accuracy and efficiency. This approach enables cost-effective, scalable ST map prediction from readily available WSIs, with robust cross-dataset performance and interpretable multi-scale fusion.

Abstract

The advancement of Spatial Transcriptomics (ST) has facilitated the spatially-aware profiling of gene expressions based on histopathology images. Although ST data offers valuable insights into the micro-environment of tumors, its acquisition cost remains expensive. Therefore, directly predicting the ST expressions from digital pathology images is desired. Current methods usually adopt existing regression backbones for this task, which ignore the inherent multi-scale hierarchical data structure of digital pathology images. To address this limit, we propose M2ORT, a many-to-one regression Transformer that can accommodate the hierarchical structure of the pathology images through a decoupled multi-scale feature extractor. Different from traditional models that are trained with one-to-one image-label pairs, M2ORT accepts multiple pathology images of different magnifications at a time to jointly predict the gene expressions at their corresponding common ST spot, aiming at learning a many-to-one relationship through training. We have tested M2ORT on three public ST datasets and the experimental results show that M2ORT can achieve state-of-the-art performance with fewer parameters and floating-point operations (FLOPs). The code is available at: https://github.com/Dootmaan/M2ORT/.
Paper Structure (43 sections, 23 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 43 sections, 23 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: (a) WSIs are obtained by scanning the glass slide tissues at different magnifications, resulting in a multi-scale pyramid data structure. (b) ST maps are generated by sampling spots on the glass slide tissues, followed by comprehensive profiling of gene expressions within each sampled spot.
  • Figure 2: (a) One-to-one regression models optimized with single-level image-label pairs. (b) One-to-one regression models optimized with multi-level image-label pairs. (c) Our proposed many-to-one regression model optimized with multi-level imageset-label pairs.
  • Figure 3: A schematic view of the proposed M2ORT. Three patches from different WSI levels are fed into the model to jointly predict the gene expressions in the corresponding spot. PE denotes Positional Encoding in the figure.
  • Figure 4: The network structure of ITMM. This module needs to be applied to each level's sequence separately.
  • Figure 5: The network structure of ICMM.
  • ...and 5 more figures