Table of Contents
Fetching ...

Accelerated Natural Gradient Method for Parametric Manifold Optimization

Chenyi Li, Shuchen Zhu, Zhonglin Xie, Zaiwen Wen

TL;DR

This work tackles parametric manifold optimization where state functions live on infinite-dimensional spaces by introducing Accelerated Natural Gradient Descent (ANGD). ANGD combines an Accelerated Riemannian Gradient (ARG) flow with Hessian-driven damping and a two-stage discretization that maps manifold updates to parameter updates, achieving a $O(t^{-2})$ convergence rate under geodesic convexity. The framework supports multiple metrics, including $L^2$, $H^s$, Fisher-Rao, and Wasserstein-2, and employs Kronecker-based preconditioning to scale to large parameter spaces. Theoretical results establish the equivalence between the continuous ARG flow and a first-order system and provide convergence guarantees, while numerical experiments on Burgers' equation, Euler equations, and variational quantum Monte Carlo demonstrate substantial acceleration over standard natural gradient methods, underscoring practical impact for PDE-constrained learning and physics-informed optimization.

Abstract

Parametric manifold optimization problems frequently arise in various machine learning tasks, where state functions are defined on infinite-dimensional manifolds. We propose a unified accelerated natural gradient descent (ANGD) framework to address these problems. By incorporating a Hessian-driven damping term into the manifold update, we derive an accelerated Riemannian gradient (ARG) flow that mitigates oscillations. An equivalent first-order system is further presented for the ARG flow, enabling a unified discretization scheme that leads to the ANGD method. In our discrete update, our framework considers various advanced techniques, including least squares approximation of the update direction, projected momentum to accelerate convergence, and efficient approximation methods through the Kronecker product. It accommodates various metrics, including $H^s$, Fisher-Rao, and Wasserstein-2 metrics, providing a computationally efficient solution for large-scale parameter spaces. We establish a convergence rate for the ARG flow under geodesic convexity assumptions. Numerical experiments demonstrate that ANGD outperforms standard NGD, underscoring its effectiveness across diverse deep learning tasks.

Accelerated Natural Gradient Method for Parametric Manifold Optimization

TL;DR

This work tackles parametric manifold optimization where state functions live on infinite-dimensional spaces by introducing Accelerated Natural Gradient Descent (ANGD). ANGD combines an Accelerated Riemannian Gradient (ARG) flow with Hessian-driven damping and a two-stage discretization that maps manifold updates to parameter updates, achieving a convergence rate under geodesic convexity. The framework supports multiple metrics, including , , Fisher-Rao, and Wasserstein-2, and employs Kronecker-based preconditioning to scale to large parameter spaces. Theoretical results establish the equivalence between the continuous ARG flow and a first-order system and provide convergence guarantees, while numerical experiments on Burgers' equation, Euler equations, and variational quantum Monte Carlo demonstrate substantial acceleration over standard natural gradient methods, underscoring practical impact for PDE-constrained learning and physics-informed optimization.

Abstract

Parametric manifold optimization problems frequently arise in various machine learning tasks, where state functions are defined on infinite-dimensional manifolds. We propose a unified accelerated natural gradient descent (ANGD) framework to address these problems. By incorporating a Hessian-driven damping term into the manifold update, we derive an accelerated Riemannian gradient (ARG) flow that mitigates oscillations. An equivalent first-order system is further presented for the ARG flow, enabling a unified discretization scheme that leads to the ANGD method. In our discrete update, our framework considers various advanced techniques, including least squares approximation of the update direction, projected momentum to accelerate convergence, and efficient approximation methods through the Kronecker product. It accommodates various metrics, including , Fisher-Rao, and Wasserstein-2 metrics, providing a computationally efficient solution for large-scale parameter spaces. We establish a convergence rate for the ARG flow under geodesic convexity assumptions. Numerical experiments demonstrate that ANGD outperforms standard NGD, underscoring its effectiveness across diverse deep learning tasks.

Paper Structure

This paper contains 28 sections, 3 theorems, 105 equations, 4 figures, 2 tables, 3 algorithms.

Key Result

Proposition 1

Define $\Phi_t=\Psi_t-\beta_t \frac{\delta L}{\delta \rho_t}$ and the Riemannian correction term The second-order accelerated Riemannian gradient flow R_acc:flow is equivalent to with initial values $\rho_0$ and $\Psi_0=\beta_0\frac{\delta L}{\delta \rho_0}$.

Figures (4)

  • Figure 1: Numerical results for boundary condition $h(x) = \sin(\pi x)$
  • Figure 2: Numerical results for boundary condition $h(x) = 1 - \cos(2\pi x)$
  • Figure 3: Numerical results for solving the Euler equations
  • Figure 4: Numerical results of VMC on the molecules $\text{Be},\text{Li}_2,\text{H}_{10},\text{N}_2$. We use "FR" and "W2" to denote the Fisher-Rao and Wasserstein-2 metrics, respectively, for brevity.

Theorems & Definitions (12)

  • Proposition 1
  • Example 1: $L^2$ ARG flow
  • Example 2: $H^s$ ($s\ge0$) ARG flow
  • Example 3: $H^s$ ($s<0$) ARG flow
  • Example 4: Fisher-Rao ARG flow
  • Example 5: Wasserstein ARG flow
  • Remark 1
  • proof
  • Lemma 1
  • proof
  • ...and 2 more