An adaptive framework for first-order gradient methods
Xiaozhe Hu, Sara Pollock, Zhongqin Xue, Yunrong Zhu
TL;DR
The paper tackles optimizing first-order gradient methods when the strong convexity parameter $\mu$ is unknown by introducing a unified adaptive framework that uses the geometric mean of successive residual-ratio estimates to form an empirical convergence-rate bound $\rho^*$. This bound guides adaptive updates of step size $\alpha$ and momentum $\beta$ for GD, NAG, and HB, with $L$ normalized to $1$ to focus on curvature exploitation. The authors prove that the adaptive schemes converge no slower than gradient descent with $\alpha=1/L$ and demonstrate through quadratic, logistic regression, and Huber-TV denoising experiments that these methods achieve competitive performance with optimal-parameter accelerators while adapting to local curvature. The approach offers a practical, simple-to-implement mechanism that captures local structure and improves robustness across varied problem classes, with potential for broad applicability in first-order optimization.
Abstract
Gradient methods are widely used in optimization problems. In practice, while the smoothness parameter can be estimated utilizing techniques such as backtracking, estimating the strong convexity parameter remains a challenge; moreover, even with the optimal parameter choice, convergence can be slow. In this work, we propose a framework for dynamically adapting the step size and momentum parameters in first-order gradient methods for the optimization problem, without prior knowledge of the strong convexity parameter. The main idea is to use the geometric average of the ratios of successive residual norms as an empirical estimate of the upper bound on the convergence rate, which in turn allows us to adaptively update the algorithm parameters. The resulting algorithms are simple to implement, yet efficient in practice, requiring only a few additional computations on existing information. The proposed adaptive gradient methods are shown to converge at least as fast as gradient descent for quadratic optimization problems. Numerical experiments on both quadratic and nonlinear problems validate the effectiveness of the proposed adaptive algorithms. The results show that the adaptive algorithms are comparable to their counterparts using optimal parameters, and in some cases, they capture local information and exhibit improved performance.
