Hyperbolic Optimization
Yanke Wang, Kyriakos Flouris
TL;DR
This work investigates optimization in non-Euclidean geometry by operating on the Poincaré ball to exploit hierarchical data structure. It introduces a Riemannian extension of AdamW and a HyperSGD variant, using gradient scaling $\mathfrak{g}_t = \frac{(1-\|\theta_{t-1}\|^2)^2}{4} g_t$ and projection-based updates to perform hyperbolic optimization. The methods are applied to denoising diffusion probabilistic models with hyperbolic time discretization of Langevin dynamics, reporting faster early convergence and, in some configurations, improved sample quality on a butterfly image dataset. The results indicate that hyperbolic optimization can yield practical benefits for non-Euclidean parameter spaces and diffusion-model training, with potential generalization to other architectures.
Abstract
This work explores optimization methods on hyperbolic manifolds. Building on Riemannian optimization principles, we extend the Hyperbolic Stochastic Gradient Descent (a specialization of Riemannian SGD) to a Hyperbolic Adam optimizer. While these methods are particularly relevant for learning on the Poincaré ball, they may also provide benefits in Euclidean and other non-Euclidean settings, as the chosen optimization encourages the learning of Poincaré embeddings. This representation, in turn, accelerates convergence in the early stages of training, when parameters are far from the optimum. As a case study, we train diffusion models using the hyperbolic optimization methods with hyperbolic time-discretization of the Langevin dynamics, and show that they achieve faster convergence on certain datasets without sacrificing generative quality.
