Table of Contents
Fetching ...

Central-to-Local Adaptive Generative Diffusion Framework for Improving Gene Expression Prediction in Data-Limited Spatial Transcriptomics

Yaoyu Fang, Jiahe Qian, Xinkun Wang, Lee A. Cooper, Bo Zhou

Abstract

Spatial Transcriptomics (ST) provides spatially resolved gene expression profiles within intact tissue architecture, enabling molecular analysis in histological context. However, the high cost, limited throughput, and restricted data sharing of ST experiments result in severe data scarcity, constraining the development of robust computational models. To address this limitation, we present a Central-to-Local adaptive generative diffusion framework for ST (C2L-ST) that integrates large-scale morphological priors with limited molecular guidance. A global central model is first pretrained on extensive histopathology datasets to learn transferable morphological representations, and institution-specific local models are then adapted through lightweight gene-conditioned modulation using a small number of paired image-gene spots. This strategy enables the synthesis of realistic and molecularly consistent histology patches under data-limited conditions. The generated images exhibit high visual and structural fidelity, reproduce cellular composition, and show strong embedding overlap with real data across multiple organs, reflecting both realism and diversity. When incorporated into downstream training, synthetic image-gene pairs improve gene expression prediction accuracy and spatial coherence, achieving performance comparable to real data while requiring only a fraction of sampled spots. C2L-ST provides a scalable and data-efficient framework for molecular-level data augmentation, offering a domain-adaptive and generalizable approach for integrating histology and transcriptomics in spatial biology and related fields.

Central-to-Local Adaptive Generative Diffusion Framework for Improving Gene Expression Prediction in Data-Limited Spatial Transcriptomics

Abstract

Spatial Transcriptomics (ST) provides spatially resolved gene expression profiles within intact tissue architecture, enabling molecular analysis in histological context. However, the high cost, limited throughput, and restricted data sharing of ST experiments result in severe data scarcity, constraining the development of robust computational models. To address this limitation, we present a Central-to-Local adaptive generative diffusion framework for ST (C2L-ST) that integrates large-scale morphological priors with limited molecular guidance. A global central model is first pretrained on extensive histopathology datasets to learn transferable morphological representations, and institution-specific local models are then adapted through lightweight gene-conditioned modulation using a small number of paired image-gene spots. This strategy enables the synthesis of realistic and molecularly consistent histology patches under data-limited conditions. The generated images exhibit high visual and structural fidelity, reproduce cellular composition, and show strong embedding overlap with real data across multiple organs, reflecting both realism and diversity. When incorporated into downstream training, synthetic image-gene pairs improve gene expression prediction accuracy and spatial coherence, achieving performance comparable to real data while requiring only a fraction of sampled spots. C2L-ST provides a scalable and data-efficient framework for molecular-level data augmentation, offering a domain-adaptive and generalizable approach for integrating histology and transcriptomics in spatial biology and related fields.

Paper Structure

This paper contains 17 sections, 10 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Overview of the C2L-ST framework for central-to-local adaptive generative modeling in spatial transcriptomics. The framework integrates large-scale public histopathology data with limited local ST data to enable molecular-level generation across centers worldwide. The central generative diffusion model is first pretrained on large-scale histopathology image patches to learn diverse tissue morphological priors. Institution-specific local models from different centers are then adapted using a small number of paired image–gene samples (red), aligning morphological features with molecular information to produce synthetic histology patches conditioned on gene expression (green). The resulting synthetic data expand the effective training distribution without requiring the exchange of real patient data across institutions. The adapted local models are further applied to unsampled tissue regions (blue) to predict gene expression directly from histopathology images. This central-to-local strategy alleviates data scarcity by generating synthetic samples, reduces domain discrepancies through shared morphological priors, and respects institutional data privacy by allowing each center to adapt models locally without data exchange.
  • Figure 2: Detailed workflow of the C2L-ST framework integrating central pretraining and local adaptation. The central generative diffusion model is first trained on large-scale public histopathology datasets to learn transferable tissue morphological priors through iterative noising and denoising processes. Each institution/center then performs local adaptation using its private ST dataset, where histology patches and corresponding gene expression values (red) guide lightweight adaptation of the pretrained model to align morphology with molecular information. After adaptation, the local model generates synthetic histology patches conditioned on gene expression (green), expanding the effective dataset. These synthetic samples are combined with limited real data to train gene prediction networks that learn the mapping from histopathology images to gene expression. The trained predictors are subsequently applied to unsampled tissue regions (blue) to infer gene expression profiles directly from histology images. This unified pipeline enables molecular-level data augmentation, improves prediction performance under data-limited conditions, and maintains data confidentiality across centers.
  • Figure 3: Molecular and morphological correspondence in generated histology patches for the bowel (ZEN48) dataset. The heatmap shows hierarchical clustering of representative genes, revealing four distinct molecular clusters (A–D). Each cluster is associated with characteristic histology patches from synthetic images, shown below with color-coded borders corresponding to the cluster identity. Dashed borders indicate synthetic images, and solid borders indicate real images. The PCA projection of gene expression highlights the separation among clusters in molecular space. Representative synthetic patches within each cluster display distinct structures consistent with their molecular signatures, indicating that the model captures the relationship between transcriptional programs and local morphology.
  • Figure 4: Feature embedding similarity between real and synthetic histology patches across organs. t-SNE visualization of feature embeddings extracted from the pretrained Conch pathology encoder for real (brown) and synthetic (blue) histology patches in four spatial transcriptomics (ST) centers: bowel (ZEN48), breast (TENX13), skin (MEND40), and lung (MEND90). In each organ, the synthetic patches exhibit distributions that closely overlap with those of the real samples while also extending the overall embedding space, indicating both high fidelity and enhanced diversity of the generated data. This embedding-level consistency demonstrates that the C2L-ST model effectively captures local tissue heterogeneity and morphological diversity across distinct anatomical domains.
  • Figure 5: Comparison of cellular composition and morphological fidelity between real and synthetic histology patches. Violin plots on the left show the distribution of six major cell types, including background, neoplastic, inflammatory, connective, dead, and epithelial, quantified using HoverNet segmentation for real (orange) and synthetic (blue) images from bowel (ZEN48) and skin (MEND40) ST datasets. The segmentation visualizations on the right display corresponding real and synthetic tissue patches with color-coded cell-type annotations, where blue indicates non-neoplastic epithelial cells, red indicates neoplastic epithelial cells, yellow indicates inflammatory cells, and green indicates connective cells. The synthetic images reproduce cell-type distributions and spatial organization that closely resemble those of real samples, while also expanding the diversity of local cellular configurations. These results demonstrate that the C2L-ST framework preserves realistic cellular composition and structural integrity across distinct tissue domains.
  • ...and 8 more figures