Multi-Slice Spatial Transcriptomics Data Integration Analysis with STG3Net

Donghai Fang; Fangfang Zhu; Wenwen Min

Multi-Slice Spatial Transcriptomics Data Integration Analysis with STG3Net

Donghai Fang, Fangfang Zhu, Wenwen Min

TL;DR

STG3Net tackles batch effects in multi-slice spatial transcriptomics by integrating a masked graph autoencoder backbone with adversarial learning and a novel Global Nearest Neighbor (G2N) anchor-pair triplet mechanism. The framework jointly learns a robust latent space for cross-slice spatial domain identification and batch correction, aided by data augmentation, per-slice adjacency graphs, and a block-diagonal global graph. Key contributions include the plug-and-play G2N method, a masked self-supervised encoder, and comprehensive evaluations on three platform-diverse datasets (DLPFC, AMB, ME), along with thorough ablations that validate each component and objective function. STG3Net demonstrates superior accuracy, consistency, and batch correction (F1_LISI) while preserving biological variability and connectivity across slices, enabling more reliable cross-slice spatial analyses in SRT studies.

Abstract

With the rapid development of the latest Spatially Resolved Transcriptomics (SRT) technology, which allows for the mapping of gene expression within tissue sections, the integrative analysis of multiple SRT data has become increasingly important. However, batch effects between multiple slices pose significant challenges in analyzing SRT data. To address these challenges, we have developed a plug-and-play batch correction method called Global Nearest Neighbor (G2N) anchor pairs selection. G2N effectively mitigates batch effects by selecting representative anchor pairs across slices. Building upon G2N, we propose STG3Net, which cleverly combines masked graph convolutional autoencoders as backbone modules. These autoencoders, integrated with generative adversarial learning, enable STG3Net to achieve robust multi-slice spatial domain identification and batch correction. We comprehensively evaluate the feasibility of STG3Net on three multiple SRT datasets from different platforms, considering accuracy, consistency, and the F1LISI metric (a measure of batch effect correction efficiency). Compared to existing methods, STG3Net achieves the best overall performance while preserving the biological variability and connectivity between slices. Source code and all public datasets used in this paper are available at https://github.com/wenwenmin/STG3Net and https://zenodo.org/records/12737170.

Multi-Slice Spatial Transcriptomics Data Integration Analysis with STG3Net

TL;DR

Abstract

Paper Structure (17 sections, 9 equations, 4 figures, 1 table)

This paper contains 17 sections, 9 equations, 4 figures, 1 table.

Introduction
PROPOSED METHODS
Overview of the proposed STG3Net
Data augmentation and construction of neighbor graph
Latent representation learning with masked reconstruction
Adversarial learning for multiple slices
Triplet learning with global nearest neighbors
Evaluation criteria
EXPERIMENTS
Dataset description
Baseline methods
Implementation Details
STG3Net achieves consistent integration among the donors of DLPFC dataset
STG3Net preserves the biological variability and connectivity across adult mouse brain (AMB) data
STG3Net identifies the structural organization of developing mouse embryonic (ME) data
...and 2 more sections

Figures (4)

Figure 1: Overview of STG3Net. (A) Data preprocessing involves integrating multiple SRT data, enhancing spot features, and constructing spatial adjacency graphs. (B) STG3Net is employed for latent representation learning. It consists of a backbone module composed of feature graph autoencoder, combined with adversarial learning and G2N for batch correction between multiple slices. (C) The learned latent representations from STG3Net will be utilized in downstream task analysis, including clustering and UMAP visualization. Additionally, the reconstructed gene expression is considered as the denoised outcome.
Figure 2: DLPFC. (A) Schematic of DLPFC data. (B) Manual annotations for Donor 1, 2, and 3. (C) The evaluation of STG3Net and existing methods on the DLPFC dataset was conducted based on consistency, accuracy, and the F1LISI metric. (D) We conduct a comparison of spatial domain identification on four slices from the Donor 3 sample. (E) The UMAP plot of slice embeddings colored by batch and cortical layers. (F) Imputed gene spatial autocorrelation.
Figure 3: AMB and ME. (A) Mouse brain coronal anatomical reference. a-c. (B) The evaluation of STG3Net and existing methods on the AMB dataset was conducted based on consistency, accuracy, and the F1LISI. (C) Cluster 1 corresponds to the Hippocampal region and exhibits features that preserve the developmental variability between slices along the AP axis. (D) The connectivity between Isocortical regions. (E) Manual annotation of tissue regions during the developmental stages of ME, including E9.5, E10.5, and E11.5. (F) The evaluation of STG3Net and existing methods on the ME dataset. (G) and (H) Denoised marker gene expression profiles for the developmental processes of four major tissues and organs, along with the corresponding annotated regions.
Figure 4: (A) Ablation studies of different components and objective functions. (B) Ablation studies of different mask rates and anchor node numbers.

Multi-Slice Spatial Transcriptomics Data Integration Analysis with STG3Net

TL;DR

Abstract

Multi-Slice Spatial Transcriptomics Data Integration Analysis with STG3Net

Authors

TL;DR

Abstract

Table of Contents

Figures (4)