From Spots to Pixels: Dense Spatial Gene Expression Prediction from Histology Images
Ruikun Zhang, Yan Yang, Liyuan Pan
TL;DR
PixNet tackles the problem of spatial gene expression prediction by moving from spot-wise regression on fixed crops to dense, pixel-wise mapping from histology images. It builds a multi-scale pyramidal feature extractor and a U-Net–style decoder to produce a dense gene expression map G, then aggregates values within circular ROIs to predict expression for arbitrary spots, all trained with sparse supervision. Across four ST datasets and multiple scales, PixNet achieves state-of-the-art PCC-based metrics and demonstrates robust cross-scale generalization (e.g., from $100\,μm$ training to $2\,μm$ testing), while ablations highlight the importance of the SAFB module, joint loss, and a foundation encoder like UNI2. This approach facilitates accurate, scalable spatial transcriptomics analyses directly from standard histology images and has potential to enhance downstream tissue molecular profiling and clinical interpretation; the authors also plan to release the source code publicly.
Abstract
Spatial transcriptomics (ST) measures gene expression at fine-grained spatial resolution, offering insights into tissue molecular landscapes. Previous methods for spatial gene expression prediction typically crop spots of interest from histopathology slide images, and train models to map each spot to a corresponding gene expression profile. However, these methods inherently lose the spatial resolution in gene expression: 1) each spot often contains multiple cells with distinct gene expression profiles; 2) spots are typically defined at fixed spatial resolutions, limiting the ability to predict gene expression at varying scales. To address these limitations, this paper presents PixNet, a dense prediction network capable of predicting spatially resolved gene expression across spots of varying sizes and scales directly from histopathology slide images. Different from previous methods that map individual spots to gene expression values, we generate a spatially dense continuous gene expression map from the histopathology slide image, and aggregate values within spots of interest to predict the gene expression. Our PixNet outperforms state-of-the-art methods on four common ST datasets in multiple spatial scales. The source code will be publicly available.
