Table of Contents
Fetching ...

Quantitative Convergences of Lie Group Momentum Optimizers

Lingkai Kong, Molei Tao

TL;DR

The paper develops two Lie-group momentum optimizers, Lie Heavy-Ball and Lie NAG-SC, derived via variational principles and left-trivialization to operate entirely with the gradient oracle and the exponential map. Under $L$-smoothness and local geodesic-$\mu$-strong convexity, Lie Heavy-Ball achieves a non-accelerated linear rate while Lie NAG-SC attains acceleration with a $\sqrt{\kappa}$-type dependence, where $\kappa=L/\mu$, albeit with a curvature term $p(a)$ reflecting the Lie-group geometry. The discretizations stay on the manifold via splitting and Euclidean-inspired momentum, avoiding costly operations like the logarithm map and parallel transport, which improves practicality on Lie groups such as $\mathsf{SO}(n)$. Theoretical results are supported by systematic numerical tests on eigenvalue decomposition problems, showing that Lie-NAG-SC outperforms Lie Heavy-Ball on ill-conditioned tasks and validating the proposed rates and role of curvature.

Abstract

Explicit, momentum-based dynamics that optimize functions defined on Lie groups can be constructed via variational optimization and momentum trivialization. Structure preserving time discretizations can then turn this dynamics into optimization algorithms. This article investigates two types of discretization, Lie Heavy-Ball, which is a known splitting scheme, and Lie NAG-SC, which is newly proposed. Their convergence rates are explicitly quantified under $L$-smoothness and local strong convexity assumptions. Lie NAG-SC provides acceleration over the momentumless case, i.e. Riemannian gradient descent, but Lie Heavy-Ball does not. When compared to existing accelerated optimizers for general manifolds, both Lie Heavy-Ball and Lie NAG-SC are computationally cheaper and easier to implement, thanks to their utilization of group structure. Only gradient oracle and exponential map are required, but not logarithm map or parallel transport which are computational costly.

Quantitative Convergences of Lie Group Momentum Optimizers

TL;DR

The paper develops two Lie-group momentum optimizers, Lie Heavy-Ball and Lie NAG-SC, derived via variational principles and left-trivialization to operate entirely with the gradient oracle and the exponential map. Under -smoothness and local geodesic--strong convexity, Lie Heavy-Ball achieves a non-accelerated linear rate while Lie NAG-SC attains acceleration with a -type dependence, where , albeit with a curvature term reflecting the Lie-group geometry. The discretizations stay on the manifold via splitting and Euclidean-inspired momentum, avoiding costly operations like the logarithm map and parallel transport, which improves practicality on Lie groups such as . Theoretical results are supported by systematic numerical tests on eigenvalue decomposition problems, showing that Lie-NAG-SC outperforms Lie Heavy-Ball on ill-conditioned tasks and validating the proposed rates and role of curvature.

Abstract

Explicit, momentum-based dynamics that optimize functions defined on Lie groups can be constructed via variational optimization and momentum trivialization. Structure preserving time discretizations can then turn this dynamics into optimization algorithms. This article investigates two types of discretization, Lie Heavy-Ball, which is a known splitting scheme, and Lie NAG-SC, which is newly proposed. Their convergence rates are explicitly quantified under -smoothness and local strong convexity assumptions. Lie NAG-SC provides acceleration over the momentumless case, i.e. Riemannian gradient descent, but Lie Heavy-Ball does not. When compared to existing accelerated optimizers for general manifolds, both Lie Heavy-Ball and Lie NAG-SC are computationally cheaper and easier to implement, thanks to their utilization of group structure. Only gradient oracle and exponential map are required, but not logarithm map or parallel transport which are computational costly.
Paper Structure (21 sections, 23 theorems, 120 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 23 theorems, 120 equations, 2 figures, 2 tables, 1 algorithm.

Key Result

Lemma 3

Under Assumption assumption_general, there exists an inner product on $\mathfrak{g}$ such that the operator $\operatorname{ad}$ is skew-adjoint, i.e., $\operatorname{ad}^*_\xi=-\operatorname{ad}_\xi$ for any $\xi \in \mathfrak{g}$.

Figures (2)

  • Figure 1: Fig. \ref{['fig_EV_compare_local']} shows that 1) Lie NAG-SC converges much faster than Lie Heavy-Ball on ill-conditioned problems; 2) The fitted dashed curve and the experimental results aligns well, showing our theoretical analysis of the convergence rate $c_{\text{HB}}$ and $c_{\text{NAG-SC}}$ is correct. Fig. \ref{['fig_EV_compare_global']} shows the performance of our algorithms on non-convex problem experimentally. In this specific experiment, Lie NAG-SC outperforms Lie Heavy-Ball and finds the global minimum successfully without being trapped in local minimums. However, we are not sure which is better in a general optimization. One possible reason for the good performance on NAG-SC is it uses larger learning rate and is better for jumping out of the local minimums. The values of Lyapunov function along the trajectory is not provided since it is not global defined.
  • Figure 2: Local convergence of Lie Heavy-Ball and Lie NAG-SC on eigenvalue decomposition problem with different condition number. The initialization is close to the global minimum. The dashed curves are the value of potential function along the trajectory and the solid curves are the values of the corresponding Lyapunov functions. Lie GD (Eq. \ref{['eqn_GD']}) has $h$ been chosen as $1/L$zhang2016first. We observe: 1. Lie NAG-SC converges much faster than Lie Heavy-Ball, especially on ill-conditioned problems. 2. Although the potential function is not monotonely decreasing, the Lyapunov is.

Theorems & Definitions (53)

  • Remark 1: Triviality of convex functions on Lie groups
  • Lemma 3: $\operatorname{ad}$ skew-adjoint milnor1976curvatures
  • Definition 4: $L$-smoothness
  • Definition 5: Locally geodesically strong convexity
  • Theorem 6: Monotonely decreasing of total energy tao2020variational
  • Definition 7: $u$ sub-level set
  • Corollary 8
  • Theorem 9: Convergence rate of the optimzation ODE
  • Remark 10
  • Theorem 11: Monotonely decreasing of modified energy of Heavy Ball
  • ...and 43 more