Table of Contents
Fetching ...

A Geometry-Aware Algorithm to Learn Hierarchical Embeddings in Hyperbolic Space

Zhangyu Wang, Lantian Xu, Zhifeng Kong, Weilong Wang, Xuyu Peng, Enyang Zheng

TL;DR

The paper tackles learning hierarchical representations in hyperbolic space, where geometry-mismatch with Euclidean methods impedes optimization. It defines three illness categories that describe misordered relationships and proposes a geometry-aware framework combining a dilation mapping with transitive closure regularization to mitigate them, with a theoretical analysis of the dilation mechanism in the Poincaré ball $\mathcal{B}^d$ and optimization on the Riemannian manifold using $d(\cdot,\cdot)$. The authors formulate local capacity and provide bounds on the $r$-packing number $\mathcal{A}(d,\theta_r)$ and validate on synthetic and real-tree datasets, showing improved MAP and MR over baselines. The work advances practical, scalable hyperbolic embeddings for tree-like data by explicitly leveraging hyperbolic geometry through dilation and transitive-closure-inspired regularization, offering guidance for future geometry-aware representation learning.

Abstract

Hyperbolic embeddings are a class of representation learning methods that offer competitive performances when data can be abstracted as a tree-like graph. However, in practice, learning hyperbolic embeddings of hierarchical data is difficult due to the different geometry between hyperbolic space and the Euclidean space. To address such difficulties, we first categorize three kinds of illness that harm the performance of the embeddings. Then, we develop a geometry-aware algorithm using a dilation operation and a transitive closure regularization to tackle these illnesses. We empirically validate these techniques and present a theoretical analysis of the mechanism behind the dilation operation. Experiments on synthetic and real-world datasets reveal superior performances of our algorithm.

A Geometry-Aware Algorithm to Learn Hierarchical Embeddings in Hyperbolic Space

TL;DR

The paper tackles learning hierarchical representations in hyperbolic space, where geometry-mismatch with Euclidean methods impedes optimization. It defines three illness categories that describe misordered relationships and proposes a geometry-aware framework combining a dilation mapping with transitive closure regularization to mitigate them, with a theoretical analysis of the dilation mechanism in the Poincaré ball and optimization on the Riemannian manifold using . The authors formulate local capacity and provide bounds on the -packing number and validate on synthetic and real-tree datasets, showing improved MAP and MR over baselines. The work advances practical, scalable hyperbolic embeddings for tree-like data by explicitly leveraging hyperbolic geometry through dilation and transitive-closure-inspired regularization, offering guidance for future geometry-aware representation learning.

Abstract

Hyperbolic embeddings are a class of representation learning methods that offer competitive performances when data can be abstracted as a tree-like graph. However, in practice, learning hyperbolic embeddings of hierarchical data is difficult due to the different geometry between hyperbolic space and the Euclidean space. To address such difficulties, we first categorize three kinds of illness that harm the performance of the embeddings. Then, we develop a geometry-aware algorithm using a dilation operation and a transitive closure regularization to tackle these illnesses. We empirically validate these techniques and present a theoretical analysis of the mechanism behind the dilation operation. Experiments on synthetic and real-world datasets reveal superior performances of our algorithm.
Paper Structure (21 sections, 16 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 21 sections, 16 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Visualizations of two-dimensional embeddings of a synthetic balance tree (156 nodes, 155 edges) learned by the baseline Poincaré embedding algorithm in Nickel2017 and our geometry-aware algorithm, respectively. Both algorithms are trained for 3000 epochs. The lines refer to ground-truth edges and the points refer to the learned hyperbolic embeddings. The red lines indicate bad cases where the embeddings fail to reconstruct these ground-truth edges.
  • Figure 2: Visualization of the learning process of two-dimensional Poincaré embeddings with (1) the baseline algorithm in the upper row, and (2) the geometry-aware algorithm in the lower row. The dataset, baseline model, and plotting settings are identical as in Figure \ref{['fig:figure2']}. We plot intra-subtree and inter-subtree illness consistently existing throughout the entire 3000 epochs.
  • Figure 3: Illustration of the three categories of illness. $A$ is the source node and $B$ is the ground-truth target node. It is called (1) capacity illness if $A$ connects to $B_1$, (2) intra-subtree illness if $A$ connects to $B_2$, and (3) inter-subtree illness if $A$ connects to $B_3$.
  • Figure 4: Number of different kinds of illness under different $\eta_{tc}$. Base denotes the baseline algorithm.
  • Figure 5: Performances of our algorithm under different hyperparameter settings on the synthetic tree. Base denotes the baseline algorithm, GA is our algorithm where DL is the dilation operation, RW is the re-weighting strategy followed by the threshold epoch number $N_{\mathrm{tc}}$.

Theorems & Definitions (2)

  • definition 1: Categories of Illness
  • definition 2: Local Capacity