Accelerated Natural Gradient Method for Parametric Manifold Optimization
Chenyi Li, Shuchen Zhu, Zhonglin Xie, Zaiwen Wen
TL;DR
This work tackles parametric manifold optimization where state functions live on infinite-dimensional spaces by introducing Accelerated Natural Gradient Descent (ANGD). ANGD combines an Accelerated Riemannian Gradient (ARG) flow with Hessian-driven damping and a two-stage discretization that maps manifold updates to parameter updates, achieving a $O(t^{-2})$ convergence rate under geodesic convexity. The framework supports multiple metrics, including $L^2$, $H^s$, Fisher-Rao, and Wasserstein-2, and employs Kronecker-based preconditioning to scale to large parameter spaces. Theoretical results establish the equivalence between the continuous ARG flow and a first-order system and provide convergence guarantees, while numerical experiments on Burgers' equation, Euler equations, and variational quantum Monte Carlo demonstrate substantial acceleration over standard natural gradient methods, underscoring practical impact for PDE-constrained learning and physics-informed optimization.
Abstract
Parametric manifold optimization problems frequently arise in various machine learning tasks, where state functions are defined on infinite-dimensional manifolds. We propose a unified accelerated natural gradient descent (ANGD) framework to address these problems. By incorporating a Hessian-driven damping term into the manifold update, we derive an accelerated Riemannian gradient (ARG) flow that mitigates oscillations. An equivalent first-order system is further presented for the ARG flow, enabling a unified discretization scheme that leads to the ANGD method. In our discrete update, our framework considers various advanced techniques, including least squares approximation of the update direction, projected momentum to accelerate convergence, and efficient approximation methods through the Kronecker product. It accommodates various metrics, including $H^s$, Fisher-Rao, and Wasserstein-2 metrics, providing a computationally efficient solution for large-scale parameter spaces. We establish a convergence rate for the ARG flow under geodesic convexity assumptions. Numerical experiments demonstrate that ANGD outperforms standard NGD, underscoring its effectiveness across diverse deep learning tasks.
