Table of Contents
Fetching ...

Spatial Transcriptomics Analysis of Zero-shot Gene Expression Prediction

Yan Yang, Md Zakir Hossain, Xuesong Li, Shafin Rahman, Eric Stone

TL;DR

The paper addresses the challenge of predicting gene expression for unseen gene types in spatial transcriptomics by introducing a semantic guided network (SGN). SGN combines (i) window feature extraction and graph-based refinement, (ii) LLM-driven functionality/phenotype descriptions embedded into a gene-type projection, and (iii) zero-shot prediction via a dot product in a shared feature space, with loss that fuses MSE and PCC. Experiments on STNet and 10xProteomic show SGN achieves competitive zero-shot performance compared to state-of-the-art supervised methods, validating effective generalization to unseen genes and the benefit of including spatial context and LLM-informed descriptions. The work advances scalable, flexible prediction in ST by enabling new gene types to be predicted without re-collecting training data, aided by prompt design and optional internet-sourced domain knowledge.

Abstract

Spatial transcriptomics (ST) captures gene expression within distinct regions (i.e., windows) of a tissue slide. Traditional supervised learning frameworks applied to model ST are constrained to predicting expression from slide image windows for gene types seen during training, failing to generalize to unseen gene types. To overcome this limitation, we propose a semantic guided network (SGN), a pioneering zero-shot framework for predicting gene expression from slide image windows. Considering a gene type can be described by functionality and phenotype, we dynamically embed a gene type to a vector per its functionality and phenotype, and employ this vector to project slide image windows to gene expression in feature space, unleashing zero-shot expression prediction for unseen gene types. The gene type functionality and phenotype are queried with a carefully designed prompt from a pre-trained large language model (LLM). On standard benchmark datasets, we demonstrate competitive zero-shot performance compared to past state-of-the-art supervised learning approaches.

Spatial Transcriptomics Analysis of Zero-shot Gene Expression Prediction

TL;DR

The paper addresses the challenge of predicting gene expression for unseen gene types in spatial transcriptomics by introducing a semantic guided network (SGN). SGN combines (i) window feature extraction and graph-based refinement, (ii) LLM-driven functionality/phenotype descriptions embedded into a gene-type projection, and (iii) zero-shot prediction via a dot product in a shared feature space, with loss that fuses MSE and PCC. Experiments on STNet and 10xProteomic show SGN achieves competitive zero-shot performance compared to state-of-the-art supervised methods, validating effective generalization to unseen genes and the benefit of including spatial context and LLM-informed descriptions. The work advances scalable, flexible prediction in ST by enabling new gene types to be predicted without re-collecting training data, aided by prompt design and optional internet-sourced domain knowledge.

Abstract

Spatial transcriptomics (ST) captures gene expression within distinct regions (i.e., windows) of a tissue slide. Traditional supervised learning frameworks applied to model ST are constrained to predicting expression from slide image windows for gene types seen during training, failing to generalize to unseen gene types. To overcome this limitation, we propose a semantic guided network (SGN), a pioneering zero-shot framework for predicting gene expression from slide image windows. Considering a gene type can be described by functionality and phenotype, we dynamically embed a gene type to a vector per its functionality and phenotype, and employ this vector to project slide image windows to gene expression in feature space, unleashing zero-shot expression prediction for unseen gene types. The gene type functionality and phenotype are queried with a carefully designed prompt from a pre-trained large language model (LLM). On standard benchmark datasets, we demonstrate competitive zero-shot performance compared to past state-of-the-art supervised learning approaches.
Paper Structure (22 sections, 7 equations, 4 figures, 2 tables)

This paper contains 22 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of fields. Training with a dataset of seen gene types, (a) traditional approaches predict the expression of fixed gene types (i.e., seen gene types) for windows of a slide image; (b) by using a large language model (LLM) to describe functionality and phenotype of gene types of interest, we flexibly predict expression of seen and unseen gene types, i.e., zero-shot learning.
  • Figure 2: Our framework. We have three stages: i) window embedding, extracting and refining features from each window by using an extractor and a GraphSAGE network that explores relations of window spatial positions and feature similarities; ii) gene type embedding, querying gene type functionality and phenotype from an LLM, and embedding the description; iii) gene type prediction, performing dot product between window embedding and gene type embedding to compute gene expression.
  • Figure 3: Ablation study on the number of neighbors used in constructing $\mathcal{E}^{\texttt{fea}}$.
  • Figure 4: Ablation study of the pre-trained feature extractor (a-d) resnetclip-vit-gdino-v2 and LLM (d-f) neural-chatfalconzephyropen-llama.