M2ORT: Many-To-One Regression Transformer for Spatial Transcriptomics Prediction from Histopathology Images
Hongyi Wang, Xiuju Du, Jing Liu, Shuyi Ouyang, Yen-Wei Chen, Lanfen Lin
TL;DR
The paper tackles predicting spatial transcriptomics gene expressions directly from histopathology WSIs by reframing ST prediction as a many-to-one regression problem that leverages multi-scale pathology information. It introduces M2ORT, a Transformer-based encoder that decouples multi-scale feature extraction via Level-Dependent Patch Embedding (LDPE), Intra-Level Token Mixing (ITMM), and Inter-Level Channel Mixing (ICMM), followed by a regression head. Across three public ST datasets, M2ORT variants achieve state-of-the-art PCC and RMSE with fewer parameters and FLOPs than existing patch-level and slide-level baselines, demonstrating strong accuracy and efficiency. This approach enables cost-effective, scalable ST map prediction from readily available WSIs, with robust cross-dataset performance and interpretable multi-scale fusion.
Abstract
The advancement of Spatial Transcriptomics (ST) has facilitated the spatially-aware profiling of gene expressions based on histopathology images. Although ST data offers valuable insights into the micro-environment of tumors, its acquisition cost remains expensive. Therefore, directly predicting the ST expressions from digital pathology images is desired. Current methods usually adopt existing regression backbones for this task, which ignore the inherent multi-scale hierarchical data structure of digital pathology images. To address this limit, we propose M2ORT, a many-to-one regression Transformer that can accommodate the hierarchical structure of the pathology images through a decoupled multi-scale feature extractor. Different from traditional models that are trained with one-to-one image-label pairs, M2ORT accepts multiple pathology images of different magnifications at a time to jointly predict the gene expressions at their corresponding common ST spot, aiming at learning a many-to-one relationship through training. We have tested M2ORT on three public ST datasets and the experimental results show that M2ORT can achieve state-of-the-art performance with fewer parameters and floating-point operations (FLOPs). The code is available at: https://github.com/Dootmaan/M2ORT/.
