DELST: Dual Entailment Learning for Hyperbolic Image-Gene Pretraining in Spatial Transcriptomics

Xulin Chen; Junzhou Huang

DELST: Dual Entailment Learning for Hyperbolic Image-Gene Pretraining in Spatial Transcriptomics

Xulin Chen, Junzhou Huang

TL;DR

DELST addresses the challenge of pretraining image encoders with spatial transcriptomics data by integrating hierarchical priors through hyperbolic embeddings. It introduces Dual Entailment Learning, comprising cross-modal entailment learning (CMEL) and intra-modal entailment learning (IMEL), to enforce hierarchical ordering across and within modalities while projecting embeddings onto the Lorentz hyperboloid. The final objective combines a contrastive loss with these entailment losses as $L_{final}=L_{cont}+\lambda L_{ent\_cross}+\beta L_{ent\_intra}$, enabling the model to learn more generalizable, biologically meaningful representations. Experiments on ST benchmarks demonstrate consistent improvements in linear-probing performance over strong baselines, validating the utility of hierarchical, hyperbolic representations for image-gene pretraining in spatial transcriptomics. Code and models are published at the provided GitHub repository for reproducibility.

Abstract

Spatial transcriptomics (ST) maps gene expression within tissue at individual spots, making it a valuable resource for multimodal representation learning. Additionally, ST inherently contains rich hierarchical information both across and within modalities. For instance, different spots exhibit varying numbers of nonzero gene expressions, corresponding to different levels of cellular activity and semantic hierarchies. However, existing methods rely on contrastive alignment of image-gene pairs, failing to accurately capture the intricate hierarchical relationships in ST data. Here, we propose DELST, the first framework to embed hyperbolic representations while modeling hierarchy for image-gene pretraining at two levels: (1) Cross-modal entailment learning, which establishes an order relationship between genes and images to enhance image representation generalization; (2) Intra-modal entailment learning, which encodes gene expression patterns as hierarchical relationships, guiding hierarchical learning across different samples at a global scale and integrating biological insights into single-modal representations. Extensive experiments on ST benchmarks annotated by pathologists demonstrate the effectiveness of our framework, achieving improved predictive performance compared to existing methods. Our code and models are available at: https://github.com/XulinChen/DELST.

DELST: Dual Entailment Learning for Hyperbolic Image-Gene Pretraining in Spatial Transcriptomics

TL;DR

Abstract

DELST: Dual Entailment Learning for Hyperbolic Image-Gene Pretraining in Spatial Transcriptomics

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)