Table of Contents
Fetching ...

Lorentzian Residual Neural Networks

Neil He, Menglin Yang, Rex Ying

TL;DR

This work introduces LResNet, a Lorentzian residual network that embeds residual connections directly on the Lorentz hyperboloid using a weighted Lorentzian centroid. By eliminating mappings to tangent spaces and parallel transport, LResNet achieves superior numerical stability, commutativity, and computational efficiency, while preserving hyperbolic structure and enabling theoretical derivations of prior methods. The approach is demonstrated across graph neural networks, graph transformers, and vision models, yielding consistent improvements over Euclidean and existing hyperbolic residual methods, and showing remarkable speedups in computation. These results highlight LResNet's potential to enable more expressive and robust hyperbolic architectures across diverse domains, with broad applicability to CNNs, GNNs, and graph Transformers.

Abstract

Hyperbolic neural networks have emerged as a powerful tool for modeling hierarchical data structures prevalent in real-world datasets. Notably, residual connections, which facilitate the direct flow of information across layers, have been instrumental in the success of deep neural networks. However, current methods for constructing hyperbolic residual networks suffer from limitations such as increased model complexity, numerical instability, and errors due to multiple mappings to and from the tangent space. To address these limitations, we introduce LResNet, a novel Lorentzian residual neural network based on the weighted Lorentzian centroid in the Lorentz model of hyperbolic geometry. Our method enables the efficient integration of residual connections in Lorentz hyperbolic neural networks while preserving their hierarchical representation capabilities. We demonstrate that our method can theoretically derive previous methods while offering improved stability, efficiency, and effectiveness. Extensive experiments on both graph and vision tasks showcase the superior performance and robustness of our method compared to state-of-the-art Euclidean and hyperbolic alternatives. Our findings highlight the potential of LResNet for building more expressive neural networks in hyperbolic embedding space as a generally applicable method to multiple architectures, including CNNs, GNNs, and graph Transformers.

Lorentzian Residual Neural Networks

TL;DR

This work introduces LResNet, a Lorentzian residual network that embeds residual connections directly on the Lorentz hyperboloid using a weighted Lorentzian centroid. By eliminating mappings to tangent spaces and parallel transport, LResNet achieves superior numerical stability, commutativity, and computational efficiency, while preserving hyperbolic structure and enabling theoretical derivations of prior methods. The approach is demonstrated across graph neural networks, graph transformers, and vision models, yielding consistent improvements over Euclidean and existing hyperbolic residual methods, and showing remarkable speedups in computation. These results highlight LResNet's potential to enable more expressive and robust hyperbolic architectures across diverse domains, with broad applicability to CNNs, GNNs, and graph Transformers.

Abstract

Hyperbolic neural networks have emerged as a powerful tool for modeling hierarchical data structures prevalent in real-world datasets. Notably, residual connections, which facilitate the direct flow of information across layers, have been instrumental in the success of deep neural networks. However, current methods for constructing hyperbolic residual networks suffer from limitations such as increased model complexity, numerical instability, and errors due to multiple mappings to and from the tangent space. To address these limitations, we introduce LResNet, a novel Lorentzian residual neural network based on the weighted Lorentzian centroid in the Lorentz model of hyperbolic geometry. Our method enables the efficient integration of residual connections in Lorentz hyperbolic neural networks while preserving their hierarchical representation capabilities. We demonstrate that our method can theoretically derive previous methods while offering improved stability, efficiency, and effectiveness. Extensive experiments on both graph and vision tasks showcase the superior performance and robustness of our method compared to state-of-the-art Euclidean and hyperbolic alternatives. Our findings highlight the potential of LResNet for building more expressive neural networks in hyperbolic embedding space as a generally applicable method to multiple architectures, including CNNs, GNNs, and graph Transformers.

Paper Structure

This paper contains 14 sections, 3 theorems, 17 equations, 3 figures, 6 tables, 1 algorithm.

Key Result

Lemma 4.1

$\sqrt{-K}|\|w_x\mathbf{x}+w_y\mathbf{y}\|_\mathcal{L}|>\sqrt{w_x^2 + w_y^2}\,$ for any $\,\mathbf{x},\mathbf{y}\in\mathbb{L}^{K,n}$ and $(w_x,w_y)\in\mathbb{R}^+\times\mathbb{R}^+\setminus\{(0,0)\}$, where $\mathbb{R}^+$ denotes the set of non-negative real numbers.

Figures (3)

  • Figure 1: Visualization of hyperbolic residual connection methods. From left to right: (a) Parallel transport-based method, (b) Tangent space-based method, and (c) The proposed LResNet. Points with superscript $\mathbb{H}$ and $\mathbb{T}$ indicate their presence in hyperbolic space and tangent space, respectively. In each subfigure, $\mathbf{z}^\mathbb{H}$ represents the sum of points $\mathbf{x}^\mathbb{H},\mathbf{y}^\mathbb{H}\in\mathbb{H}$. PT denotes parallel Transport, and log and exp denotes the logarithmic and exponential mappings respectively. Our proposed method LResNet overcomes limitations L(i, ii, iii, iv) by eliminating mappings and parallel transport (whose absence is shown via $\checkmark$), where the other two methods depend on as least one (shown via $\times$).
  • Figure 2: Adaptation of LResNet to (a) 3-layer GNN architecture (b) residual block for vision tasks, and (c) graph transformer. $\mathcal{G}$ represents graph convolutional layers in (a) and convolutional layer implemented with HL in (b). $\bigoplus$ is a hyperbolic residual connection.
  • Figure 3: Comparison of ROC AUC (%) differences for link prediction (LP) between HyboNet with and without residual connections: orange indicates our LResNet as the residual connection, while blue represents the tangent space method as the residual connection. In \ref{['fig:disease_many_layers']} we show the results on the Disease dataset and in \ref{['fig:airport_many_layers']} we show the results on the airport dataset.

Theorems & Definitions (3)

  • Lemma 4.1
  • Proposition 4.2
  • Theorem 4.3