Table of Contents
Fetching ...

Fully Hyperbolic Neural Networks

Weize Chen, Xu Han, Yankai Lin, Hexu Zhao, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou

TL;DR

This work addresses the limitation of existing hyperbolic networks that rely on tangent space operations by introducing a fully hyperbolic framework grounded in the Lorentz model. By leveraging Lorentz boosts and rotations, the authors design a complete set of neural operations including a hyperbolic linear layer, attention, residuals, and position encoding, all staying within hyperbolic space. They demonstrate that tangent-space linear layers are a restricted subset of Lorentz transformations, and show through extensive NLP experiments that HyboNet achieves better or comparable results with fewer parameters and improved stability. The approach offers a new direction for hyperbolic representation learning with practical benefits for both shallow and deep models and provides code for further research.

Abstract

Hyperbolic neural networks have shown great potential for modeling complex data. However, existing hyperbolic networks are not completely hyperbolic, as they encode features in a hyperbolic space yet formalize most of their operations in the tangent space (a Euclidean subspace) at the origin of the hyperbolic space. This hybrid method greatly limits the modeling ability of networks. In this paper, we propose a fully hyperbolic framework to build hyperbolic networks based on the Lorentz model by adapting the Lorentz transformations (including boost and rotation) to formalize essential operations of neural networks. Moreover, we also prove that linear transformation in tangent spaces used by existing hyperbolic networks is a relaxation of the Lorentz rotation and does not include the boost, implicitly limiting the capabilities of existing hyperbolic networks. The experimental results on four NLP tasks show that our method has better performance for building both shallow and deep networks. Our code will be released to facilitate follow-up research.

Fully Hyperbolic Neural Networks

TL;DR

This work addresses the limitation of existing hyperbolic networks that rely on tangent space operations by introducing a fully hyperbolic framework grounded in the Lorentz model. By leveraging Lorentz boosts and rotations, the authors design a complete set of neural operations including a hyperbolic linear layer, attention, residuals, and position encoding, all staying within hyperbolic space. They demonstrate that tangent-space linear layers are a restricted subset of Lorentz transformations, and show through extensive NLP experiments that HyboNet achieves better or comparable results with fewer parameters and improved stability. The approach offers a new direction for hyperbolic representation learning with practical benefits for both shallow and deep models and provides code for further research.

Abstract

Hyperbolic neural networks have shown great potential for modeling complex data. However, existing hyperbolic networks are not completely hyperbolic, as they encode features in a hyperbolic space yet formalize most of their operations in the tangent space (a Euclidean subspace) at the origin of the hyperbolic space. This hybrid method greatly limits the modeling ability of networks. In this paper, we propose a fully hyperbolic framework to build hyperbolic networks based on the Lorentz model by adapting the Lorentz transformations (including boost and rotation) to formalize essential operations of neural networks. Moreover, we also prove that linear transformation in tangent spaces used by existing hyperbolic networks is a relaxation of the Lorentz rotation and does not include the boost, implicitly limiting the capabilities of existing hyperbolic networks. The experimental results on four NLP tasks show that our method has better performance for building both shallow and deep networks. Our code will be released to facilitate follow-up research.

Paper Structure

This paper contains 46 sections, 3 theorems, 23 equations, 2 figures, 12 tables.

Key Result

Theorem 1

$\forall \mathbf{x} \in \mathbb{L}^n_K, \forall \mathbf{M} \in \mathbb{R}^{(m+1)\times(n+1)}$, we have $f_{\mathbf{x}}(\mathbf{M})\mathbf{x} \in \mathbb{L}^m_K$.

Figures (2)

  • Figure 1: Illustration of a hyperbolic linear layer based on the logarithmic and exponential maps as well as different transformations in the Lorentz model. In \ref{['fig:logexp']}, $A$ is mapped to $B$ in the tangent space at the origin $\mathcal{T}_{\mathbf{0}}\mathbb{L}^n_K$ through the logarithmic map. A Euclidean linear transformation is performed to obtain $C$. Finally, $C$ is mapped back to the hyperbolic space through the exponential map. \ref{['fig:lorentz-boost', 'fig:lorentz-rotation']} are the visualization of the Lorentz boost and rotation, where points on the intersection of a plane and the hyperboloid are still coplanar after the Lorentz boost. \ref{['fig:pseudo-rotation']} is pseudo-rotation in \ref{['sec:relation-tangent']}, where a point is first transformed and then projected onto the hyperboloid.
  • Figure 2: Validation curves of knowledge graph models.

Theorems & Definitions (8)

  • Definition 1: Lorentz Boost
  • Definition 2: Lorentz Rotation
  • Theorem 1
  • Proof 1
  • Lemma 1
  • Proof 2
  • Lemma 2
  • Proof 3