Table of Contents
Fetching ...

Geodesic Gradient Descent: A Generic and Learning-rate-free Optimizer on Objective Function-induced Manifolds

Liwei Hu, Guangyao Li, Wenyong Wang, Xiaoming Zhang, Yu Xiang

TL;DR

Geodesic gradient descent is proposed, a generic and learning-rate-free Riemannian gradient descent algorithm that achieves test MSE reductions ranging from 35.79% to 48.76% for fully connected networks on the Burgers'dataset, and cross-entropy loss reductions ranging from 3.14% to 11.59% for convolutional neural networks on the MNIST dataset.

Abstract

Euclidean gradient descent algorithms barely capture the geometry of objective function-induced hypersurfaces and risk driving update trajectories off the hypersurfaces. Riemannian gradient descent algorithms address these issues but fail to represent complex hypersurfaces via a single classic manifold. We propose geodesic gradient descent (GGD), a generic and learning-rate-free Riemannian gradient descent algorithm. At each iteration, GGD uses an n-dimensional sphere to approximate a local neighborhood on the objective function-induced hypersurface, adapting to arbitrarily complex geometries. A tangent vector derived from the Euclidean gradient is projected onto the sphere to form a geodesic, ensuring the update trajectory stays on the hypersurface. Parameter updates are performed using the endpoint of the geodesic. The maximum step size of the gradient in GGD is equal to a quarter of the arc length on the n-dimensional sphere, thus eliminating the need for a learning rate. Experimental results show that compared with the classic Adam algorithm, GGD achieves test MSE reductions ranging from 35.79% to 48.76% for fully connected networks on the Burgers' dataset, and cross-entropy loss reductions ranging from 3.14% to 11.59% for convolutional neural networks on the MNIST dataset.

Geodesic Gradient Descent: A Generic and Learning-rate-free Optimizer on Objective Function-induced Manifolds

TL;DR

Geodesic gradient descent is proposed, a generic and learning-rate-free Riemannian gradient descent algorithm that achieves test MSE reductions ranging from 35.79% to 48.76% for fully connected networks on the Burgers'dataset, and cross-entropy loss reductions ranging from 3.14% to 11.59% for convolutional neural networks on the MNIST dataset.

Abstract

Euclidean gradient descent algorithms barely capture the geometry of objective function-induced hypersurfaces and risk driving update trajectories off the hypersurfaces. Riemannian gradient descent algorithms address these issues but fail to represent complex hypersurfaces via a single classic manifold. We propose geodesic gradient descent (GGD), a generic and learning-rate-free Riemannian gradient descent algorithm. At each iteration, GGD uses an n-dimensional sphere to approximate a local neighborhood on the objective function-induced hypersurface, adapting to arbitrarily complex geometries. A tangent vector derived from the Euclidean gradient is projected onto the sphere to form a geodesic, ensuring the update trajectory stays on the hypersurface. Parameter updates are performed using the endpoint of the geodesic. The maximum step size of the gradient in GGD is equal to a quarter of the arc length on the n-dimensional sphere, thus eliminating the need for a learning rate. Experimental results show that compared with the classic Adam algorithm, GGD achieves test MSE reductions ranging from 35.79% to 48.76% for fully connected networks on the Burgers' dataset, and cross-entropy loss reductions ranging from 3.14% to 11.59% for convolutional neural networks on the MNIST dataset.
Paper Structure (16 sections, 23 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 23 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: The convergence direction in Euclidean space.
  • Figure 2: The Riemannian gradient fei2025survey.
  • Figure 3: The convergence process of GGD algorithm.
  • Figure 4: The variation curve of RBFs.
  • Figure 5: Different vectors in the tangent space of a circle and their corresponding projected geodesics on the circle.
  • ...and 4 more figures