Table of Contents
Fetching ...

Learning Protein-Ligand Binding in Hyperbolic Space

Jianhui Wang, Wenyu Zhu, Bowen Gao, Xin Hong, Ya-Qin Zhang, Wei-Ying Ma, Yanyan Lan

TL;DR

HypSeek is proposed, a hyperbolic representation learning framework that embeds ligands, protein pockets, and sequences into Lorentz-model hyperbolic space and unifies virtual screening and affinity ranking in a single framework, introducing a protein-guided three-tower architecture to enhance representational structure.

Abstract

Protein-ligand binding prediction is central to virtual screening and affinity ranking, two fundamental tasks in drug discovery. While recent retrieval-based methods embed ligands and protein pockets into Euclidean space for similarity-based search, the geometry of Euclidean embeddings often fails to capture the hierarchical structure and fine-grained affinity variations intrinsic to molecular interactions. In this work, we propose HypSeek, a hyperbolic representation learning framework that embeds ligands, protein pockets, and sequences into Lorentz-model hyperbolic space. By leveraging the exponential geometry and negative curvature of hyperbolic space, HypSeek enables expressive, affinity-sensitive embeddings that can effectively model both global activity and subtle functional differences-particularly in challenging cases such as activity cliffs, where structurally similar ligands exhibit large affinity gaps. Our mode unifies virtual screening and affinity ranking in a single framework, introducing a protein-guided three-tower architecture to enhance representational structure. HypSeek improves early enrichment in virtual screening on DUD-E from 42.63 to 51.44 (+20.7%) and affinity ranking correlation on JACS from 0.5774 to 0.7239 (+25.4%), demonstrating the benefits of hyperbolic geometry across both tasks and highlighting its potential as a powerful inductive bias for protein-ligand modeling.

Learning Protein-Ligand Binding in Hyperbolic Space

TL;DR

HypSeek is proposed, a hyperbolic representation learning framework that embeds ligands, protein pockets, and sequences into Lorentz-model hyperbolic space and unifies virtual screening and affinity ranking in a single framework, introducing a protein-guided three-tower architecture to enhance representational structure.

Abstract

Protein-ligand binding prediction is central to virtual screening and affinity ranking, two fundamental tasks in drug discovery. While recent retrieval-based methods embed ligands and protein pockets into Euclidean space for similarity-based search, the geometry of Euclidean embeddings often fails to capture the hierarchical structure and fine-grained affinity variations intrinsic to molecular interactions. In this work, we propose HypSeek, a hyperbolic representation learning framework that embeds ligands, protein pockets, and sequences into Lorentz-model hyperbolic space. By leveraging the exponential geometry and negative curvature of hyperbolic space, HypSeek enables expressive, affinity-sensitive embeddings that can effectively model both global activity and subtle functional differences-particularly in challenging cases such as activity cliffs, where structurally similar ligands exhibit large affinity gaps. Our mode unifies virtual screening and affinity ranking in a single framework, introducing a protein-guided three-tower architecture to enhance representational structure. HypSeek improves early enrichment in virtual screening on DUD-E from 42.63 to 51.44 (+20.7%) and affinity ranking correlation on JACS from 0.5774 to 0.7239 (+25.4%), demonstrating the benefits of hyperbolic geometry across both tasks and highlighting its potential as a powerful inductive bias for protein-ligand modeling.

Paper Structure

This paper contains 25 sections, 1 theorem, 36 equations, 3 figures, 6 tables.

Key Result

Proposition 1

(Hyperbolic Separation of Activity Cliffs) Let $\ell_1, \ell_2$ be structurally similar ligands with large affinity differences. Under constant radial norm and small angular deviation, hyperbolic embeddings yield significantly larger geodesic distance than their Euclidean counterparts: This highlights the capacity of hyperbolic geometry to distinguish functionally divergent ligands without distor

Figures (3)

  • Figure 1: Illustration of how hyperbolic geometry distinguishes activity cliffs (PDB ID: 5EHR). Left: Two structurally similar ligands (Ligand ID: 5OD vs. its amino-substituent-removed derivative) show an $\sim$80-fold affinity difference. Right: The yellow and red points denote the two ligands; the blue point is the pocket. Dashed lines show distances in hyperbolic (red/light blue) and Euclidean (dark blue) space. Euclidean embeddings preserve structural similarity but fail to reflect affinity gaps, while hyperbolic embeddings separate such pairs via both radial and angular dimensions ($D_H$, green), enabling affinity-sensitive representations.
  • Figure 2: Overall architecture of HypSeek: three encoders lift ligands, pockets and protein sequences to a shared hyperbolic space (left); contrastive and list‑wise ranking losses align pocket/sequence with ligands while the cone–hierarchy loss imposes radial–angular tiers around each pocket (right).
  • Figure 3: Pairwise analysis and CO-SNE visualization on the JACS benchmark. (A) Accuracy of affinity change prediction on ligand pairs with different ECFP4 similarity, comparing Euclidean and hyperbolic spaces; (B) Pearson's $R$ between predicted score difference and ground truth affinity gap; (C) CO-SNE visualization of ligand embeddings in hyperbolic space without the hyperbolic constraint loss; (D) CO-SNE visualization of our HypSeek ligand embeddings.

Theorems & Definitions (1)

  • Proposition 1