Table of Contents
Fetching ...

Hyperbolic Optimization

Yanke Wang, Kyriakos Flouris

TL;DR

This work investigates optimization in non-Euclidean geometry by operating on the Poincaré ball to exploit hierarchical data structure. It introduces a Riemannian extension of AdamW and a HyperSGD variant, using gradient scaling $\mathfrak{g}_t = \frac{(1-\|\theta_{t-1}\|^2)^2}{4} g_t$ and projection-based updates to perform hyperbolic optimization. The methods are applied to denoising diffusion probabilistic models with hyperbolic time discretization of Langevin dynamics, reporting faster early convergence and, in some configurations, improved sample quality on a butterfly image dataset. The results indicate that hyperbolic optimization can yield practical benefits for non-Euclidean parameter spaces and diffusion-model training, with potential generalization to other architectures.

Abstract

This work explores optimization methods on hyperbolic manifolds. Building on Riemannian optimization principles, we extend the Hyperbolic Stochastic Gradient Descent (a specialization of Riemannian SGD) to a Hyperbolic Adam optimizer. While these methods are particularly relevant for learning on the Poincaré ball, they may also provide benefits in Euclidean and other non-Euclidean settings, as the chosen optimization encourages the learning of Poincaré embeddings. This representation, in turn, accelerates convergence in the early stages of training, when parameters are far from the optimum. As a case study, we train diffusion models using the hyperbolic optimization methods with hyperbolic time-discretization of the Langevin dynamics, and show that they achieve faster convergence on certain datasets without sacrificing generative quality.

Hyperbolic Optimization

TL;DR

This work investigates optimization in non-Euclidean geometry by operating on the Poincaré ball to exploit hierarchical data structure. It introduces a Riemannian extension of AdamW and a HyperSGD variant, using gradient scaling and projection-based updates to perform hyperbolic optimization. The methods are applied to denoising diffusion probabilistic models with hyperbolic time discretization of Langevin dynamics, reporting faster early convergence and, in some configurations, improved sample quality on a butterfly image dataset. The results indicate that hyperbolic optimization can yield practical benefits for non-Euclidean parameter spaces and diffusion-model training, with potential generalization to other architectures.

Abstract

This work explores optimization methods on hyperbolic manifolds. Building on Riemannian optimization principles, we extend the Hyperbolic Stochastic Gradient Descent (a specialization of Riemannian SGD) to a Hyperbolic Adam optimizer. While these methods are particularly relevant for learning on the Poincaré ball, they may also provide benefits in Euclidean and other non-Euclidean settings, as the chosen optimization encourages the learning of Poincaré embeddings. This representation, in turn, accelerates convergence in the early stages of training, when parameters are far from the optimum. As a case study, we train diffusion models using the hyperbolic optimization methods with hyperbolic time-discretization of the Langevin dynamics, and show that they achieve faster convergence on certain datasets without sacrificing generative quality.

Paper Structure

This paper contains 10 sections, 11 equations, 9 figures, 2 algorithms.

Figures (9)

  • Figure 1: The Euclidean and hyperbolic optimizations.
  • Figure 2: FID Comparison of sgd and hyperbolic sgd optimizers with 200 inference steps.
  • Figure 3: FID Comparison of AdamW and hyperbolic AdamW optimizers with 200 and 50 inference steps, respectively.
  • Figure 4: Training DDPM via SGD (lr=2e-3), increasing epochs (10, 20, ..., 500).
  • Figure 5: Training DDPM via SGD (lr=2e-3) based on T sampled in a unit hyperbola, increasing epochs (10, 20, ..., 500).
  • ...and 4 more figures