Block cubic Newton with greedy selection
Andrea Cristofari
TL;DR
This work addresses unconstrained minimization of functions with Lipschitz continuous Hessians by introducing Inexact Block Cubic Newton (IBCN), a second-order block coordinate method with a greedy Gauss-Southwell block selection rule. It combines a cubic model on a chosen block with inexact minimizers, trust-region–style updates, and adaptive σ_k to achieve global convergence and favorable worst-case iteration bounds: O($ε^{-3/2}$) to reduce block-stationarity and O($ε^{-2}$) for full stationarity, improving over prior cyclic-type results. Numerical experiments on sparse least-squares and regularized logistic regression demonstrate that IBCN often outperforms cyclic and random block updates, particularly for larger block sizes and when Hessian information can be effectively utilized. The method does not require the Hessian Lipschitz constant, accommodates changing block sizes, and comes with public code, highlighting practical impact for nonconvex and large-scale problems.
Abstract
A second-order block coordinate descent method is proposed for the unconstrained minimization of an objective function with a Lipschitz continuous Hessian. At each iteration, a block of variables is selected by means of a greedy (Gauss-Southwell) rule which considers the amount of first-order stationarity violation, then an approximate minimizer of a cubic model is computed for the block update. In the proposed scheme, blocks are not required to have a predetermined structure and their size may change during the iterations. For non-convex objective functions, global convergence to stationary points is proved and a worst-case iteration complexity analysis is provided. In particular, given a tolerance $ε$, we show that at most ${\cal O(ε^{-3/2})}$ iterations are needed to drive the stationarity violation with respect to a selected block of variables below $ε$, while at most ${\cal O(ε^{-2})}$ iterations are needed to drive the stationarity violation with respect to all variables below $ε$. Numerical results are finally given, comparing the proposed approach with other second-order methods and block selection rules.
