Stochastic Hessian Fittings with Lie Groups
Xi-Lin Li
TL;DR
The paper develops a unified framework for stochastic Hessian fitting using the PSGD preconditioner criterion, connecting classical second-order methods (BFGS, Gauss-Newton, natural gradient) with modern inverse-free, Lie-group based preconditioner updates. It proves convexity properties in SPD and Lie-group geometries, with strong convexity on a polar-quotient of GL(n,R), enabling linear convergence of SGD-type updates. It introduces multiple inverse-free and sparse Lie-group preconditioners (diagonal, Kronecker, low-rank) and presents both theoretical and empirical results showing robust performance in noisy and time-varying settings, plus practical algorithms for large-scale problems. The work also establishes practical links to Newton-Schulz iterations and demonstrates the approach on tensor decomposition and transformer/GPT-scale tasks, highlighting improved stability and convergence without expensive inverses or decompositions. Overall, the framework offers scalable, robust second-order optimization tools for stochastic problems across Euclidean, SPD, and Lie-group geometries, with concrete methods and empirical validation.
Abstract
This report investigates the fitting of the Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion derived from the preconditioned stochastic gradient descent (PSGD) method. This criterion is closely related to many widely used second-order and adaptive gradient optimization methods, including BFGS, the Gauss-Newton algorithm, natural gradient descent, and AdaGrad. Our analyses reveal the efficiency and reliability differences of a broad range of preconditioner fitting methods, ranging from closed-form to iterative approaches, using Hessian-vector products or stochastic gradients only, with Hessian fittings across various geometric settings (the Euclidean space, the manifold of symmetric positive definite (SPD) matrices, and a variety of Lie groups). The most intriguing finding is that the Hessian fitting problem is strongly convex under mild conditions in certain general Lie groups. This result turns Hessian fitting into a well-behaved Lie group optimization problem and facilitates the design of highly efficient and elegant Lie group sparse preconditioner fitting methods for large-scale stochastic optimizations.
