Adaptive Multilevel Newton: A Quadratically Convergent Optimization Method
Nick Tsipinakis, Panos Parpas, Matthias Voigt
TL;DR
The paper tackles the inefficiency of pure Newton methods in the early optimization phase by proposing Adaptive Multilevel (AML) Newton, which adaptively switches between coarse-level subspaces and full Newton steps to realize a local quadratic convergence regime. It provides rigorous results: local quadratic rates for strongly convex functions with Lipschitz Hessians and for self-concordant functions, plus probabilistic quadratic convergence under Johnson–Lindenstrauss-type subspace embeddings. The AML--Newton framework demonstrates through extensive experiments that cheap early iterations paired with principled level-acceptance rules can outperform Newton, Gradient Descent, and classical multilevel Newton methods on structured, low-rank problems. The approach has practical impact for large-scale, ill-conditioned optimization where second-order information is crucial but expensive to compute from scratch. Overall, AML--Newton integrates automatic level selection with adaptive stepping to achieve fast, robust convergence comparable to Newton’s rate while reducing total runtime in the critical initial phase.
Abstract
Newton's method may exhibit slower convergence than vanilla Gradient Descent in its initial phase on strongly convex problems. Classical Newton-type multilevel methods mitigate this but, like Gradient Descent, achieve only linear convergence near the minimizer. We introduce an adaptive multilevel Newton-type method with a principled automatic switch to full Newton once its quadratic phase is reached. The local quadratic convergence for strongly convex functions with Lipschitz continuous Hessians and for self-concordant functions is established and confirmed empirically. Although per-iteration cost can exceed that of classical multilevel schemes, the method is efficient and consistently outperforms Newton's method, Gradient Descent, and the multilevel Newton method, indicating that second-order methods can outperform first-order methods even when Newton's method is initially slow.
