AutoGD: Automatic Learning Rate Selection for Gradient Descent
Nikola Surjanovic, Alexandre Bouchard-Côté, Trevor Campbell
TL;DR
AutoGD tackles the critical challenge of learning-rate tuning in gradient-based optimization by automatically selecting step sizes at each iteration from a small set around a baseline, including a no-movement option, and enforcing descent with an Armijo condition. The approach yields both asymptotic and nonasymptotic guarantees: it converges to a local minimum for $L$-smooth (and even locally strongly convex) objectives without knowledge of $L$ or $\mu$, and attains near-optimal GD rates up to constants under mild unimodality assumptions. The method is demonstrated to be robust across classical optimization problems and variational-inference tasks, often outperforming backtracking line search and standard GD, and its extensions to AutoBFGS and AutoLBFGS show substantial practical gains. These results suggest a broadly useful, tuning-free optimization primitive suitable for inner-loop use in larger algorithms and stochastic settings, with promising avenues for future work on richer LR grids and second-order variants. $x_{t+1}=x_t-\gamma_t\nabla f(x_t)$ and the proposed selection set $\{0, c^{-1}\gamma_t, \gamma_t, c\gamma_t\}$ form the core mechanism, with convergence proven under mild regularity conditions and Armijo-type safeguards.
Abstract
The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate learning rate schedule. When such methods appear as inner loops of other algorithms, expecting the user to tune the learning rates may be impractical. To address this, we introduce AutoGD: a gradient descent method that automatically determines whether to increase or decrease the learning rate at a given iteration. We establish the convergence of AutoGD, and show that we can recover the optimal rate of GD (up to a constant) for a broad class of functions without knowledge of smoothness constants. Experiments on a variety of traditional problems and variational inference optimization tasks demonstrate strong performance of the method, along with its extensions to AutoBFGS and AutoLBFGS.
