Table of Contents
Fetching ...

STProtein: predicting spatial protein expression from multi-omics data

Zhaorui Jiang, Yingfang Yuan, Lei Hu, Wei Pang

TL;DR

STProtein tackles the data imbalance between spatial transcriptomics and spatial proteomics by predicting spatial protein expression from RNA data using a graph attention autoencoder with multi-task learning. It builds a KNN-based feature graph, learns joint RNA-protein embeddings, and performs upstream protein prediction plus downstream clustering, achieving superior results across three spatial-omics datasets. Ablation studies show the importance of the KNN graph, multi-task loss, and GATv2 layers, confirming the design choices. By enabling cross-platform predictions and revealing latent tissue patterns, STProtein accelerates spatial multi-omics integration and the discovery of previously hidden patterns, or 'Dark Matter', in tissue biology, while acknowledging limitations and proposing multimodal extensions for future work.

Abstract

The integration of spatial multi-omics data from single tissues is crucial for advancing biological research. However, a significant data imbalance impedes progress: while spatial transcriptomics data is relatively abundant, spatial proteomics data remains scarce due to technical limitations and high costs. To overcome this challenge we propose STProtein, a novel framework leveraging graph neural networks with multi-task learning strategy. STProtein is designed to accurately predict unknown spatial protein expression using more accessible spatial multi-omics data, such as spatial transcriptomics. We believe that STProtein can effectively addresses the scarcity of spatial proteomics, accelerating the integration of spatial multi-omics and potentially catalyzing transformative breakthroughs in life sciences. This tool enables scientists to accelerate discovery by identifying complex and previously hidden spatial patterns of proteins within tissues, uncovering novel relationships between different marker genes, and exploring the biological "Dark Matter".

STProtein: predicting spatial protein expression from multi-omics data

TL;DR

STProtein tackles the data imbalance between spatial transcriptomics and spatial proteomics by predicting spatial protein expression from RNA data using a graph attention autoencoder with multi-task learning. It builds a KNN-based feature graph, learns joint RNA-protein embeddings, and performs upstream protein prediction plus downstream clustering, achieving superior results across three spatial-omics datasets. Ablation studies show the importance of the KNN graph, multi-task loss, and GATv2 layers, confirming the design choices. By enabling cross-platform predictions and revealing latent tissue patterns, STProtein accelerates spatial multi-omics integration and the discovery of previously hidden patterns, or 'Dark Matter', in tissue biology, while acknowledging limitations and proposing multimodal extensions for future work.

Abstract

The integration of spatial multi-omics data from single tissues is crucial for advancing biological research. However, a significant data imbalance impedes progress: while spatial transcriptomics data is relatively abundant, spatial proteomics data remains scarce due to technical limitations and high costs. To overcome this challenge we propose STProtein, a novel framework leveraging graph neural networks with multi-task learning strategy. STProtein is designed to accurately predict unknown spatial protein expression using more accessible spatial multi-omics data, such as spatial transcriptomics. We believe that STProtein can effectively addresses the scarcity of spatial proteomics, accelerating the integration of spatial multi-omics and potentially catalyzing transformative breakthroughs in life sciences. This tool enables scientists to accelerate discovery by identifying complex and previously hidden spatial patterns of proteins within tissues, uncovering novel relationships between different marker genes, and exploring the biological "Dark Matter".
Paper Structure (28 sections, 30 equations, 10 figures, 10 tables)

This paper contains 28 sections, 30 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Training Framework of STProtein
  • Figure 2: Workflow for Upstream and Downstream Tasks by Using STProtein
  • Figure 3: a): H&E Histological Image for Mouse Spleen Structure. Image adapted from long_2023_10362607.; b): Ground Truth of Clustering Results for Mouse Spleen with its Orignial Annotation ($\text{RpMZ}\Phi$, B Cell and T Cell) Shown in the Right.; c): UMAP Picture and clustering Visualization Picture for STProtein with its Annotation ($\text{RpMZ}\Phi$, $\text{MZM}\Phi$, $\text{MMM}\Phi$, B Cell and T Cell) Shown in the Right.
  • Figure 4: Parameter Sensitivity Experiments for Reconstruction Loss Weights for RNA and Protein Items on Mouse Spleen Dataset (SPOTS). The Data Presented in Heatmap are Presented in Percentages.
  • Figure 5: Visualization Results about Comparison of Benchmarking Prediction Methods and STProtein with Original Ground Truth on Mouse Spleen Dataset (SPOTS)
  • ...and 5 more figures