Table of Contents
Fetching ...

GenAR: Next-Scale Autoregressive Generation for Spatial Gene Expression Prediction

Jiarui Ouyang, Yihui Wang, Yihang Gao, Yingxue Xu, Shu Yang, Hao Chen

TL;DR

GenAR tackles the cost and interpretability barriers of spatial transcriptomics by predicting spatial gene expression from H&E images using a discrete, multi-scale autoregressive framework. It groups genes into hierarchical clusters and performs codebook-free discrete token generation to predict raw counts, conditioning decoding on fused histological and spatial embeddings via AdaLN. The method factorizes the conditional distribution as $p(\mathbf{y}|\mathbf{H})=\prod_{k=1}^{K} p(\mathbf{y}^{(k)}|\mathbf{H},\mathbf{y}^{(<k)})$ and optimizes a multi-scale loss $\mathcal{L}_{\text{total}}=\frac{1}{K}\sum_{k=1}^{K}\mathcal{L}_k$, with intermediate KL terms and a final scale Gaussian likelihood $\sigma^2=\alpha\mu+\beta$. Empirically, GenAR achieves state-of-the-art performance on four Spatial Transcriptomics datasets, demonstrating robust cross-tissue generalization and the practical potential for cost-effective molecular profiling.

Abstract

Spatial Transcriptomics (ST) offers spatially resolved gene expression but remains costly. Predicting expression directly from widely available Hematoxylin and Eosin (H&E) stained images presents a cost-effective alternative. However, most computational approaches (i) predict each gene independently, overlooking co-expression structure, and (ii) cast the task as continuous regression despite expression being discrete counts. This mismatch can yield biologically implausible outputs and complicate downstream analyses. We introduce GenAR, a multi-scale autoregressive framework that refines predictions from coarse to fine. GenAR clusters genes into hierarchical groups to expose cross-gene dependencies, models expression as codebook-free discrete token generation to directly predict raw counts, and conditions decoding on fused histological and spatial embeddings. From an information-theoretic perspective, the discrete formulation avoids log-induced biases and the coarse-to-fine factorization aligns with a principled conditional decomposition. Extensive experimental results on four Spatial Transcriptomics datasets across different tissue types demonstrate that GenAR achieves state-of-the-art performance, offering potential implications for precision medicine and cost-effective molecular profiling. Code is publicly available at https://github.com/oyjr/genar.

GenAR: Next-Scale Autoregressive Generation for Spatial Gene Expression Prediction

TL;DR

GenAR tackles the cost and interpretability barriers of spatial transcriptomics by predicting spatial gene expression from H&E images using a discrete, multi-scale autoregressive framework. It groups genes into hierarchical clusters and performs codebook-free discrete token generation to predict raw counts, conditioning decoding on fused histological and spatial embeddings via AdaLN. The method factorizes the conditional distribution as and optimizes a multi-scale loss , with intermediate KL terms and a final scale Gaussian likelihood . Empirically, GenAR achieves state-of-the-art performance on four Spatial Transcriptomics datasets, demonstrating robust cross-tissue generalization and the practical potential for cost-effective molecular profiling.

Abstract

Spatial Transcriptomics (ST) offers spatially resolved gene expression but remains costly. Predicting expression directly from widely available Hematoxylin and Eosin (H&E) stained images presents a cost-effective alternative. However, most computational approaches (i) predict each gene independently, overlooking co-expression structure, and (ii) cast the task as continuous regression despite expression being discrete counts. This mismatch can yield biologically implausible outputs and complicate downstream analyses. We introduce GenAR, a multi-scale autoregressive framework that refines predictions from coarse to fine. GenAR clusters genes into hierarchical groups to expose cross-gene dependencies, models expression as codebook-free discrete token generation to directly predict raw counts, and conditions decoding on fused histological and spatial embeddings. From an information-theoretic perspective, the discrete formulation avoids log-induced biases and the coarse-to-fine factorization aligns with a principled conditional decomposition. Extensive experimental results on four Spatial Transcriptomics datasets across different tissue types demonstrate that GenAR achieves state-of-the-art performance, offering potential implications for precision medicine and cost-effective molecular profiling. Code is publicly available at https://github.com/oyjr/genar.

Paper Structure

This paper contains 30 sections, 6 equations, 13 figures, 6 tables, 2 algorithms.

Figures (13)

  • Figure 1: Overall architecture of GenAR. (a) Genes are clustered into hierarchical groups from coarse to fine granularity. (b) Image and spatial features are fused to generate histological embeddings. (c) Multi-scale autoregressive generation progressively refines predictions across scales.
  • Figure 2: Progressive multi-scale generation process, illustrating sequence construction and upsampling initialization during training and inference phases.
  • Figure 3: Spatial visualization of SSR4 gene expression prediction on HER2ST SPA148 sample. From left to right: histopathological image, ground truth, and predictions from GenAR, BLEEP, M2OST, TRIPLEX, and STEM. Color scale: low (purple/blue) to high (yellow/green) expression.
  • Figure 4: Spatial visualization comparison of C12orf57 gene expression prediction on HER2ST SPA148 sample.
  • Figure 5: Spatial visualization comparison of EIF4G1 gene expression prediction on HER2ST SPA148 sample.
  • ...and 8 more figures