A Homogenization Approach for Gradient-Dominated Stochastic Optimization
Jiyuan Tan, Chenyu Xue, Chuwen Zhang, Qi Deng, Dongdong Ge, Yinyu Ye
TL;DR
The paper studies stochastic non-convex optimization under gradient-dominance with index $\alpha\in[1,2]$ and introduces SHSODM, a homogenization-based second-order method that replaces cubic-regularized subproblems with an eigenvalue-based subproblem, yielding a per-iteration cost of $\tilde{O}(n^2)$. It proves that SHSODM attains competitive, and in some regimes optimal, sample complexity comparable to SCRN, with variance-reduction variants further improving rates to $O(\epsilon^{-2/\alpha})$ for $\alpha\in[1,3/2)$, while avoiding expensive linear systems. The method relies on adaptive line-search over the augmented matrix parameter $\delta_k$ and a gradient perturbation to handle degenerate directions, and uses Lanczos-type solvers to compute the leftmost eigenpair. Empirically, SHSODM demonstrates superior performance and robustness in reinforcement learning tasks against SCRN and first-order baselines, highlighting its practicality for ill-conditioned problems. The work broadens the applicability of homogenization techniques to gradient-dominated stochastic optimization and offers a scalable route for second-order methods in large-scale problems.
Abstract
Gradient dominance property is a condition weaker than strong convexity, yet sufficiently ensures global convergence even in non-convex optimization. This property finds wide applications in machine learning, reinforcement learning (RL), and operations management. In this paper, we propose the stochastic homogeneous second-order descent method (SHSODM) for stochastic functions enjoying gradient dominance property based on a recently proposed homogenization approach. Theoretically, we provide its sample complexity analysis, and further present an enhanced result by incorporating variance reduction techniques. Our findings show that SHSODM matches the best-known sample complexity achieved by other second-order methods for gradient-dominated stochastic optimization but without cubic regularization. Empirically, since the homogenization approach only relies on solving extremal eigenvector problem at each iteration instead of Newton-type system, our methods gain the advantage of cheaper computational cost and robustness in ill-conditioned problems. Numerical experiments on several RL tasks demonstrate the better performance of SHSODM compared to other off-the-shelf methods.
