Table of Contents
Fetching ...

DELST: Dual Entailment Learning for Hyperbolic Image-Gene Pretraining in Spatial Transcriptomics

Xulin Chen, Junzhou Huang

TL;DR

DELST addresses the challenge of pretraining image encoders with spatial transcriptomics data by integrating hierarchical priors through hyperbolic embeddings. It introduces Dual Entailment Learning, comprising cross-modal entailment learning (CMEL) and intra-modal entailment learning (IMEL), to enforce hierarchical ordering across and within modalities while projecting embeddings onto the Lorentz hyperboloid. The final objective combines a contrastive loss with these entailment losses as $L_{final}=L_{cont}+\lambda L_{ent\_cross}+\beta L_{ent\_intra}$, enabling the model to learn more generalizable, biologically meaningful representations. Experiments on ST benchmarks demonstrate consistent improvements in linear-probing performance over strong baselines, validating the utility of hierarchical, hyperbolic representations for image-gene pretraining in spatial transcriptomics. Code and models are published at the provided GitHub repository for reproducibility.

Abstract

Spatial transcriptomics (ST) maps gene expression within tissue at individual spots, making it a valuable resource for multimodal representation learning. Additionally, ST inherently contains rich hierarchical information both across and within modalities. For instance, different spots exhibit varying numbers of nonzero gene expressions, corresponding to different levels of cellular activity and semantic hierarchies. However, existing methods rely on contrastive alignment of image-gene pairs, failing to accurately capture the intricate hierarchical relationships in ST data. Here, we propose DELST, the first framework to embed hyperbolic representations while modeling hierarchy for image-gene pretraining at two levels: (1) Cross-modal entailment learning, which establishes an order relationship between genes and images to enhance image representation generalization; (2) Intra-modal entailment learning, which encodes gene expression patterns as hierarchical relationships, guiding hierarchical learning across different samples at a global scale and integrating biological insights into single-modal representations. Extensive experiments on ST benchmarks annotated by pathologists demonstrate the effectiveness of our framework, achieving improved predictive performance compared to existing methods. Our code and models are available at: https://github.com/XulinChen/DELST.

DELST: Dual Entailment Learning for Hyperbolic Image-Gene Pretraining in Spatial Transcriptomics

TL;DR

DELST addresses the challenge of pretraining image encoders with spatial transcriptomics data by integrating hierarchical priors through hyperbolic embeddings. It introduces Dual Entailment Learning, comprising cross-modal entailment learning (CMEL) and intra-modal entailment learning (IMEL), to enforce hierarchical ordering across and within modalities while projecting embeddings onto the Lorentz hyperboloid. The final objective combines a contrastive loss with these entailment losses as , enabling the model to learn more generalizable, biologically meaningful representations. Experiments on ST benchmarks demonstrate consistent improvements in linear-probing performance over strong baselines, validating the utility of hierarchical, hyperbolic representations for image-gene pretraining in spatial transcriptomics. Code and models are published at the provided GitHub repository for reproducibility.

Abstract

Spatial transcriptomics (ST) maps gene expression within tissue at individual spots, making it a valuable resource for multimodal representation learning. Additionally, ST inherently contains rich hierarchical information both across and within modalities. For instance, different spots exhibit varying numbers of nonzero gene expressions, corresponding to different levels of cellular activity and semantic hierarchies. However, existing methods rely on contrastive alignment of image-gene pairs, failing to accurately capture the intricate hierarchical relationships in ST data. Here, we propose DELST, the first framework to embed hyperbolic representations while modeling hierarchy for image-gene pretraining at two levels: (1) Cross-modal entailment learning, which establishes an order relationship between genes and images to enhance image representation generalization; (2) Intra-modal entailment learning, which encodes gene expression patterns as hierarchical relationships, guiding hierarchical learning across different samples at a global scale and integrating biological insights into single-modal representations. Extensive experiments on ST benchmarks annotated by pathologists demonstrate the effectiveness of our framework, achieving improved predictive performance compared to existing methods. Our code and models are available at: https://github.com/XulinChen/DELST.

Paper Structure

This paper contains 19 sections, 10 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Characteristics of ST Data. (a) The spot radius distribution of human breast, the tissue type with the largest sample size in the STimage-1K4M dataset chen2024stimage1k4m, spans a wide range. (b) The spots on the same slide are ranked in descending order based on the number of nonzero gene expressions per spot, indicating varying levels of cellular activity across different spots.
  • Figure 2: Overview of DELST. (a) Spot images and gene expressions are encoded separately and projected into hyperbolic space via the exponential map. DELST enforces cross-modal and intra-modal hierarchies by positioning broader concepts near and finer-grained concepts farther from the hyperboloid’s origin. (b) $\mathbf{H}$ (HNGEC spot) corresponds to a finer-grained hierarchy than $\mathbf{L}$ (LNGEC spot). This intra-modal entailment relationship is independently applied to the gene ($\mathbf{L^{G}}$, $\mathbf{H^{G}}$) and image ($\mathbf{L^{I}}$, $\mathbf{H^{I}}$) modalities. (c) The image embedding $\mathbf{I}$ is pushed to be within the cone projected by its paired gene embedding $\mathbf{G}$.