Cubic regularized subspace Newton for non-convex optimization
Jim Zhao, Aurelien Lucchi, Nikita Doikov
TL;DR
This work addresses the challenge of optimizing non-convex functions in high dimensions by introducing SSCN, a stochastic subspace cubic Newton method that applies cubic regularization to a randomly selected coordinate subset. By combining second-order information projected onto a subspace with cubic regularization and a flexible sampling strategy, SSCN achieves global convergence to stationary points and interpolates between coordinate descent and full cubic Newton as the subspace size τ grows. The authors establish convergence guarantees for arbitrary τ, derive enhanced rates with exact Hessian information, and present an adaptive sampling scheme that drives τ dynamically to attain a second-order stationary point at a rate of O(ε^{-3/2}, ε^{-3}), while demonstrating substantial empirical speed-ups over first-order methods on standard datasets. These results enable efficient, scalable second-order optimization in over-parameterized machine learning settings where full Hessian computations are prohibitive.
Abstract
This paper addresses the optimization problem of minimizing non-convex continuous functions, which is relevant in the context of high-dimensional machine learning applications characterized by over-parametrization. We analyze a randomized coordinate second-order method named SSCN which can be interpreted as applying cubic regularization in random subspaces. This approach effectively reduces the computational complexity associated with utilizing second-order information, rendering it applicable in higher-dimensional scenarios. Theoretically, we establish convergence guarantees for non-convex functions, with interpolating rates for arbitrary subspace sizes and allowing inexact curvature estimation. When increasing subspace size, our complexity matches $\mathcal{O}(ε^{-3/2})$ of the cubic regularization (CR) rate. Additionally, we propose an adaptive sampling scheme ensuring exact convergence rate of $\mathcal{O}(ε^{-3/2}, ε^{-3})$ to a second-order stationary point, even without sampling all coordinates. Experimental results demonstrate substantial speed-ups achieved by SSCN compared to conventional first-order methods.
