SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms
Farshed Abdukhakimov, Chulu Xiang, Dmitry Kamzolov, Robert Gower, Martin Takáč
TL;DR
The paper addresses the need for tunable learning rates in adaptive optimizers and the challenge of ill-conditioning in training neural models. It introduces SANIA, a general, parameter-free preconditioned Polyak framework that unifies first- and second-order methods and yields scale- and affine-invariant variants. Key contributions include the first stochastic Cubic Newton method with Polyak step-size, new scale-invariant AdaGrad-SQR and Adam-SQR variants, and SANIA PCG for Newton in convex and non-convex settings, supplemented by affine/scale-invariance proofs and comprehensive experiments on convex and non-convex tasks. The approach promises robust, tuning-free optimization across varying data bases and scalings, with practical impact for deep learning and generalized linear models.
Abstract
Adaptive optimization methods are widely recognized as among the most popular approaches for training Deep Neural Networks (DNNs). Techniques such as Adam, AdaGrad, and AdaHessian utilize a preconditioner that modifies the search direction by incorporating information about the curvature of the objective function. However, despite their adaptive characteristics, these methods still require manual fine-tuning of the step-size. This, in turn, impacts the time required to solve a particular problem. This paper presents an optimization framework named SANIA to tackle these challenges. Beyond eliminating the need for manual step-size hyperparameter settings, SANIA incorporates techniques to address poorly scaled or ill-conditioned problems. We also explore several preconditioning methods, including Hutchinson's method, which approximates the Hessian diagonal of the loss function. We conclude with an extensive empirical examination of the proposed techniques across classification tasks, covering both convex and non-convex contexts.
