Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks

Yongchang Hao; Yanshuai Cao; Lili Mou

Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks

Yongchang Hao, Yanshuai Cao, Lili Mou

TL;DR

Ginger tackles the intractability of second-order optimization in deep neural networks by directly maintaining an inverse generalized Gauss--Newton (GGN) approximation with a damped, low-rank eigendecomposition. It models $G_{t,\gamma}$ as $\gamma I + U_t \text{diag}(\sigma_t) U_t^T$, enabling efficient Woodbury-based inverse-vector products and an online, EMA-driven update of the low-rank basis via fast truncated SVD updates. Theoretical convergence to stationary points for non-convex objectives is established, and empirical results on CIFAR-100 (ResNet-18/50) and XSUM (T5-small with and without LoRA) show Ginger achieves superior or competitive accuracy while preserving linear time and space complexity in model size. The work demonstrates practical curvature-aware optimization for general architectures, with code released to enable broader adoption. Overall, Ginger offers a scalable alternative to full-matrix second-order methods, balancing accuracy and resource usage for modern neural networks.

Abstract

Second-order optimization approaches like the generalized Gauss-Newton method are considered more powerful as they utilize the curvature information of the objective function with preconditioning matrices. Albeit offering tempting theoretical benefits, they are not easily applicable to modern deep learning. The major reason is due to the quadratic memory and cubic time complexity to compute the inverse of the matrix. These requirements are infeasible even with state-of-the-art hardware. In this work, we propose Ginger, an eigendecomposition for the inverse of the generalized Gauss-Newton matrix. Our method enjoys efficient linear memory and time complexity for each iteration. Instead of approximating the conditioning matrix, we directly maintain its inverse to make the approximation more accurate. We provide the convergence result of Ginger for non-convex objectives. Our experiments on different tasks with different model architectures verify the effectiveness of our method. Our code is publicly available.

Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks

TL;DR

, enabling efficient Woodbury-based inverse-vector products and an online, EMA-driven update of the low-rank basis via fast truncated SVD updates. Theoretical convergence to stationary points for non-convex objectives is established, and empirical results on CIFAR-100 (ResNet-18/50) and XSUM (T5-small with and without LoRA) show Ginger achieves superior or competitive accuracy while preserving linear time and space complexity in model size. The work demonstrates practical curvature-aware optimization for general architectures, with code released to enable broader adoption. Overall, Ginger offers a scalable alternative to full-matrix second-order methods, balancing accuracy and resource usage for modern neural networks.

Abstract

Paper Structure (34 sections, 5 theorems, 42 equations, 4 tables, 1 algorithm)

This paper contains 34 sections, 5 theorems, 42 equations, 4 tables, 1 algorithm.

Introduction
Approach
Background: generalized Gauss--Newton and natural gradient methods
The generalized Gauss--Newton method.
The connection to natural gradient.
Stochastic natural gradient descent.
Quasi-natural gradient method
Our approach: Ginger
Querying the update direction.
Update rules.
Efficient SVD.
Theoretical analyses
Experiments
Image classification
Dataset.
...and 19 more sections

Key Result

Lemma 1

The approximation $G_{t,\gamma}^{-1}$ has bounded eigenvalues for all $t \ge 0$. Specifically, we have $0 < \lambda_{\min}(G_{t,\gamma}^{-1}) \le \lambda_{\max}(G_{t,\gamma}^{-1}) = \gamma^{-1}$ whenever $\tau < d$.

Theorems & Definitions (10)

Lemma 1
proof
Lemma 2
proof
Theorem 1
proof
Lemma 2
proof
Lemma 2
proof

Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks

TL;DR

Abstract

Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (10)