The Numerical Stability of Hyperbolic Representation Learning

Gal Mishne; Zhengchao Wan; Yusu Wang; Sheng Yang

The Numerical Stability of Hyperbolic Representation Learning

Gal Mishne, Zhengchao Wan, Yusu Wang, Sheng Yang

TL;DR

This work analyzes the numerical stability of hyperbolic representation learning, comparing the Poincaré ball and Lorentz model and identifying optimization-driven advantages for the Lorentz model despite its smaller representational capacity. It introduces a Euclidean parametrization of hyperbolic space that preserves full capacity while yielding optimization dynamics similar to the Lorentz model, and extends this approach to hyperplanes and a new hyperbolic SVM formulation (LSVMPP). The authors provide theoretical results on gradient behavior, radius-based representation limits, and isometric transitions between models, complemented by empirical evidence from tree embeddings and multiclass SVM tasks. The proposed Euclidean parametrization improves robustness and performance, offering practical guidelines for stable hyperbolic learning and scalable hierarchical representation.

Abstract

Given the exponential growth of the volume of the ball w.r.t. its radius, the hyperbolic space is capable of embedding trees with arbitrarily small distortion and hence has received wide attention for representing hierarchical datasets. However, this exponential growth property comes at a price of numerical instability such that training hyperbolic learning models will sometimes lead to catastrophic NaN problems, encountering unrepresentable values in floating point arithmetic. In this work, we carefully analyze the limitation of two popular models for the hyperbolic space, namely, the Poincaré ball and the Lorentz model. We first show that, under the 64 bit arithmetic system, the Poincaré ball has a relatively larger capacity than the Lorentz model for correctly representing points. Then, we theoretically validate the superiority of the Lorentz model over the Poincaré ball from the perspective of optimization. Given the numerical limitations of both models, we identify one Euclidean parametrization of the hyperbolic space which can alleviate these limitations. We further extend this Euclidean parametrization to hyperbolic hyperplanes and exhibits its ability in improving the performance of hyperbolic SVM.

The Numerical Stability of Hyperbolic Representation Learning

TL;DR

Abstract

Paper Structure (43 sections, 7 theorems, 64 equations, 8 figures, 5 tables)

This paper contains 43 sections, 7 theorems, 64 equations, 8 figures, 5 tables.

Introduction
Our Contributions
Preliminary
Poincaré Ball
Lorentz Model
Notation for Norms and Gradients
Transition Between the Two Models
Operations on $\mathbb{H}^n$
Comparing Lorentz and Poincaré Models
Poincaré Ball
Lorentz Model
Optimization
Euclidean Parametrization of Hyperbolic Space
Feature Parametrization
Optimization
...and 28 more sections

Key Result

Proposition 3.1

For any point $x\in\mathbb{D}^n$, if $\left\|x\right\|=1-10^{-k}$ for some positive number $k$, then in fact,

Figures (8)

Figure 1: Simulated Tree 1 and Tree 2 in $\mathbb{R}^2$. The 2D coordinates of each node are features and pairwise distances are computed through shortest path distance on the connected graph.
Figure 2: Embedding of Tree 1 (top row) and Tree 2 (bottom row) at the final epoch with Poincaré (left), Lorentz (mid), and Euclidean parametrization (right).
Figure 3: Median Riemannian Gradient Norm Ratios by epoch for Tree 1 (left) and Tree 2 (right): Poincaré gradients have significantly smaller norms than others
Figure 4: CIFAR, Fashion-MNIST, and Paul datasets Poincaré visualization. Each color correspond to one label class.
Figure 5: Simulated Trees in $\mathbb{R}^2$. We label figures by Tree 1 - 4 (top row) and Tree 5 - 8 (bottom row). The 2D coordinates of each node are features and pairwise distances computed through shortest path distance on the connected graph.
...and 3 more figures

Theorems & Definitions (13)

Proposition 3.1: Poincaré radius
Proposition 3.2: Lorentz radius
Proposition 3.3: Gradient descent is the "same" for both models
Lemma 3.4
Theorem 3.5
Remark 4.1: The mysterious "2"
Theorem 4.2
Remark 4.3
Remark 4.4
Remark 4.5: What if $z=0$?
...and 3 more

The Numerical Stability of Hyperbolic Representation Learning

TL;DR

Abstract

The Numerical Stability of Hyperbolic Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (13)