Single Cells Are Spatial Tokens: Transformers for Spatial Transcriptomic Data Imputation

Hongzhi Wen; Wenzhuo Tang; Wei Jin; Jiayuan Ding; Renming Liu; Xinnan Dai; Feng Shi; Lulu Shang; Hui Liu; Yuying Xie

Single Cells Are Spatial Tokens: Transformers for Spatial Transcriptomic Data Imputation

Hongzhi Wen, Wenzhuo Tang, Wei Jin, Jiayuan Ding, Renming Liu, Xinnan Dai, Feng Shi, Lulu Shang, Hui Liu, Yuying Xie

TL;DR

This paper tackles missing data in high-resolution spatial transcriptomics by introducing SpaFormer, a transformer-based framework that treats cells as spatial tokens and uses spatially aware positional encodings. It systematically analyzes multiple encoding schemes and proposes a bi-level masked autoencoder to enable effective imputation with global cell interactions, aided by efficient Performer attention for scalability. Across three CosMx cellular-level datasets, SpaFormer achieves state-of-the-art imputation accuracy and enhanced clustering performance while demonstrating favorable computational efficiency. The work highlights the importance of selecting suitable spatial encodings and masking strategies to unlock long-range intercellular information for accurate spatial transcriptomic imputation.

Abstract

Spatially resolved transcriptomics brings exciting breakthroughs to single-cell analysis by providing physical locations along with gene expression. However, as a cost of the extremely high spatial resolution, the cellular level spatial transcriptomic data suffer significantly from missing values. While a standard solution is to perform imputation on the missing values, most existing methods either overlook spatial information or only incorporate localized spatial context without the ability to capture long-range spatial information. Using multi-head self-attention mechanisms and positional encoding, transformer models can readily grasp the relationship between tokens and encode location information. In this paper, by treating single cells as spatial tokens, we study how to leverage transformers to facilitate spatial tanscriptomics imputation. In particular, investigate the following two key questions: (1) $\textit{how to encode spatial information of cells in transformers}$, and (2) $\textit{ how to train a transformer for transcriptomic imputation}$. By answering these two questions, we present a transformer-based imputation framework, SpaFormer, for cellular-level spatial transcriptomic data. Extensive experiments demonstrate that SpaFormer outperforms existing state-of-the-art imputation algorithms on three large-scale datasets while maintaining superior computational efficiency.

Single Cells Are Spatial Tokens: Transformers for Spatial Transcriptomic Data Imputation

TL;DR

Abstract

, and (2)

. By answering these two questions, we present a transformer-based imputation framework, SpaFormer, for cellular-level spatial transcriptomic data. Extensive experiments demonstrate that SpaFormer outperforms existing state-of-the-art imputation algorithms on three large-scale datasets while maintaining superior computational efficiency.

Paper Structure (32 sections, 13 equations, 6 figures, 4 tables)

This paper contains 32 sections, 13 equations, 6 figures, 4 tables.

Introduction
Preliminary
Problem Statement
Transformers
Encoding Spatial Information in Transformers
Our Framework: SpaFormer
Generalized Autoencoder Framework
Bi-level Masked Autoencoders
Experiment
Experimental settings
Imputation Performance
Scalability Analysis.
Clustering Performance
Ablation Study
Positional Encodings
...and 17 more sections

Figures (6)

Figure 1: A sample image of protein, RNA molecules, and segmented cells. Colors in sub-figure (a) indicate the protein molecules that are stained. These proteins contribute to the cell segmentation process, which results in the sub-figure (b). The final output from the pipeline consists of the position of each cell and a cell-by-gene count matrix.
Figure 2: An illustration of our transformer-based autoencoder framework for spatial transcriptomics data imputation.
Figure 3: Clustering performance on imputed data of Lung dataset.
Figure 4: Comparison between different positional encodings on three datasets. Values indicate Pearson correlation coefficient.
Figure 5: Ablation study on different autoencoder variants, i.e., vanilla autoencoders (AE), variational autoencoders (VAE), bi-level masked autoencoder (MVAE) and masked autoencoder (MAE). For each variant, we implement both MLP and ZINB decoders. Values indicate the Pearson correlation coefficient on the Liver dataset.
...and 1 more figures

Single Cells Are Spatial Tokens: Transformers for Spatial Transcriptomic Data Imputation

TL;DR

Abstract

Single Cells Are Spatial Tokens: Transformers for Spatial Transcriptomic Data Imputation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)