Table of Contents
Fetching ...

Large-scale spatial variable gene atlas for spatial transcriptomics

Jiawen Chen, Jinwei Zhang, Dongshen Peng, Yutong Song, Aitong Ruan, Yun Li, Didong Li

TL;DR

This work addresses the need to benchmark SVG detection methods across diverse tissue types and platforms in spatial transcriptomics. It assembles STimage-1K4M (662 slides, 18 tissues) to evaluate 20 SVG methods, culminating in the first cross-tissue SVG atlas and insights into method robustness, scalability, and platform effects. Key findings include systematic differences in performance linked to tissue architecture, with methods like SINFONIA and Moran's I aligning well with ground-truth domain markers, and notable sensitivity to spatial domain imbalance. The resulting atlas and benchmarking framework provide a resource for method selection, cross-tissue biological discovery, and the development of adaptive, atlas-informed SVG detection.

Abstract

Spatial variable genes (SVGs) reveal critical information about tissue architecture, cellular interactions, and disease microenvironments. As spatial transcriptomics (ST) technologies proliferate, accurately identifying SVGs across diverse platforms, tissue types, and disease contexts has become both a major opportunity and a significant computational challenge. Here, we present a comprehensive benchmarking study of 20 state-of-the-art SVG detection methods using human slides from STimage-1K4M, a large-scale resource of ST data comprising 662 slides from more than 18 tissue types. We evaluate each method across a range of biologically and technically meaningful criteria, including recovery of pathologist-annotated domain-specific markers, cross-slide reproducibility, scalability to high-resolution data, and robustness to technical variation. Our results reveal marked differences in performance depending on tissue type, spatial resolution, and study design. Beyond benchmarking, we construct the first cross-tissue atlas of SVGs, enabling comparative analysis of spatial gene programs across cancer and normal tissues. We observe similarities between pairs of tissues that reflect developmental and functional relationships, such as high overlap between thymus and lymph node, and uncover spatial gene programs associated with metastasis, immune infiltration, and tissue-of-origin identity in cancer. Together, our work defines a framework for evaluating and interpreting spatial gene expression and establishes a reference resource for the ST community.

Large-scale spatial variable gene atlas for spatial transcriptomics

TL;DR

This work addresses the need to benchmark SVG detection methods across diverse tissue types and platforms in spatial transcriptomics. It assembles STimage-1K4M (662 slides, 18 tissues) to evaluate 20 SVG methods, culminating in the first cross-tissue SVG atlas and insights into method robustness, scalability, and platform effects. Key findings include systematic differences in performance linked to tissue architecture, with methods like SINFONIA and Moran's I aligning well with ground-truth domain markers, and notable sensitivity to spatial domain imbalance. The resulting atlas and benchmarking framework provide a resource for method selection, cross-tissue biological discovery, and the development of adaptive, atlas-informed SVG detection.

Abstract

Spatial variable genes (SVGs) reveal critical information about tissue architecture, cellular interactions, and disease microenvironments. As spatial transcriptomics (ST) technologies proliferate, accurately identifying SVGs across diverse platforms, tissue types, and disease contexts has become both a major opportunity and a significant computational challenge. Here, we present a comprehensive benchmarking study of 20 state-of-the-art SVG detection methods using human slides from STimage-1K4M, a large-scale resource of ST data comprising 662 slides from more than 18 tissue types. We evaluate each method across a range of biologically and technically meaningful criteria, including recovery of pathologist-annotated domain-specific markers, cross-slide reproducibility, scalability to high-resolution data, and robustness to technical variation. Our results reveal marked differences in performance depending on tissue type, spatial resolution, and study design. Beyond benchmarking, we construct the first cross-tissue atlas of SVGs, enabling comparative analysis of spatial gene programs across cancer and normal tissues. We observe similarities between pairs of tissues that reflect developmental and functional relationships, such as high overlap between thymus and lymph node, and uncover spatial gene programs associated with metastasis, immune infiltration, and tissue-of-origin identity in cancer. Together, our work defines a framework for evaluating and interpreting spatial gene expression and establishes a reference resource for the ST community.

Paper Structure

This paper contains 16 sections, 39 figures.

Figures (39)

  • Figure 1: SVG method overview. (a) Conceptual overview of SVG detection methods. Most methods take as input a ST dataset containing gene expression and spatial coordinates, optionally incorporating additional modalities such as histology images or cell-type labels. The core output of each method is a score matrix or ranking, used to classify genes as SVGs or not. (b) Comparative summary of 20 SVG detection methods evaluated in this study.
  • Figure 2: Overview of the STimage-1K4M dataset and computational time evaluation. (a) Summary of the human portion of the STimage-1K4M dataset. (b) Distribution of number of spots (left) and number of genes (right) per slide. (c) Subset of 66 slides were annotated by pathologists to provide ground-truth labels. (d) Benchmarking of computational cost for 20 SVG detection methods.
  • Figure 3: Benchmarking SVG detection methods using domain-specific DE genes. (a) Illustration of the evaluation framework. Cancer slides: (b) Overall performance of 20 methods across 28 cancer-annotated slides, sorted by median Jaccard Index. (c) Method performance stratified by tissue type (breast and prostate) and technological platform (Spatial Transcriptomics and Visium). (d) Slide-level performance sorted by cancer proportion (bottom). Non-cancer slides: (e) Method performance across 32 non-cancer-annotated tissues. (f) Method performance stratified by tissue type (kidney, brain, breast, and prostate).
  • Figure 4: SVG detection robustness across tissues. (a) Illustration of the comparison of SVGs across multiple slides and methods. (b) Jaccard Index of within-tissue robustness for each method, ranked by median Jaccard Index. Each dot represents a tissue type. (c) Average pairwise Jaccard index between slides of the same cancer tissue type for each method. (d) Jaccard Index of within-cancer-tissue robustness for each method, ranked by median Jaccard Index. Each dot represents a cancer type. (e) Average pairwise Jaccard index between slides of the same non-cancer tissue type. (f) Jaccard Index of within-normal-tissue robustness for each method, ranked by median Jaccard Index. Each dot represents a normal tissue type.
  • Figure 5: Cross-tissue similarity of SVG sets across (a) cancer slides and (b) non-cancer slides. Boxplots of pairwise Jaccard Index between (c) breast cancer, (d) skin cancer, (d) liver cancer and other tissues.
  • ...and 34 more figures