Triangle Steepest Descent: A Geometry-Based Gradient Algorithm with Guaranteed R-Linear Convergence
Ya Shen, Qing-Na Li, Yu-Hong Dai
TL;DR
Triangle Steepest Descent (TSD) introduces a geometry-driven gradient variant that periodically aggregates past gradient directions via a cycle parameter $j$, coupled with exact line searches. The authors prove that TSD achieves at least $R$-linear convergence for strongly convex quadratic problems with a rate $\left(\frac{\kappa-1}{\kappa+1}\right)^{\frac{j-1}{j}}$, and demonstrate superlinear behavior in practice, outperforming the Barzilai-Borwein (BB) and monotone Dai-Yuan (DY) methods on ill-conditioned quadratics. Empirical studies show TSD is competitive but not universally superior to state-of-the-art methods like BBQ and ABBmin2 on general unconstrained problems, while exhibiting clear advantages in certain spectral regimes. The work highlights the value of embedding geometric information into gradient directions and sets the stage for extensions to non-quadratic objectives and adaptive strategies for the cycle parameter $j$.
Abstract
Gradient methods are among the simplest yet most widely used algorithms for unconstrained optimization. Motivated by a geometric property of the steepest descent (SD) method that can alleviate the zigzag behavior in quadratic problems, we develop a new gradient variant called the Triangle Steepest Descent (TSD) method. The TSD method introduces a cycle parameter $j$ that governs the periodic combination of past search directions, providing a geometry-driven mechanism to enhance convergence. To the best of our knowledge, TSD is the first formally established geometry-based gradient scheme since Akaike (1959). We prove that TSD is at least R-linearly convergent for strongly convex quadratic problems and demonstrate through extensive numerical experiments that it exhibits superlinear behavior, outperforming the Barzilai-Borwein (BB) method and monotone Dai-Yuan gradient method (DY) in quadratic cases. These results suggest that incorporating geometric information into gradient directions offers a promising avenue for developing efficient optimization algorithms.
