STProtein: predicting spatial protein expression from multi-omics data
Zhaorui Jiang, Yingfang Yuan, Lei Hu, Wei Pang
TL;DR
STProtein tackles the data imbalance between spatial transcriptomics and spatial proteomics by predicting spatial protein expression from RNA data using a graph attention autoencoder with multi-task learning. It builds a KNN-based feature graph, learns joint RNA-protein embeddings, and performs upstream protein prediction plus downstream clustering, achieving superior results across three spatial-omics datasets. Ablation studies show the importance of the KNN graph, multi-task loss, and GATv2 layers, confirming the design choices. By enabling cross-platform predictions and revealing latent tissue patterns, STProtein accelerates spatial multi-omics integration and the discovery of previously hidden patterns, or 'Dark Matter', in tissue biology, while acknowledging limitations and proposing multimodal extensions for future work.
Abstract
The integration of spatial multi-omics data from single tissues is crucial for advancing biological research. However, a significant data imbalance impedes progress: while spatial transcriptomics data is relatively abundant, spatial proteomics data remains scarce due to technical limitations and high costs. To overcome this challenge we propose STProtein, a novel framework leveraging graph neural networks with multi-task learning strategy. STProtein is designed to accurately predict unknown spatial protein expression using more accessible spatial multi-omics data, such as spatial transcriptomics. We believe that STProtein can effectively addresses the scarcity of spatial proteomics, accelerating the integration of spatial multi-omics and potentially catalyzing transformative breakthroughs in life sciences. This tool enables scientists to accelerate discovery by identifying complex and previously hidden spatial patterns of proteins within tissues, uncovering novel relationships between different marker genes, and exploring the biological "Dark Matter".
