Block cubic Newton with greedy selection

Andrea Cristofari

Block cubic Newton with greedy selection

Andrea Cristofari

TL;DR

This work addresses unconstrained minimization of functions with Lipschitz continuous Hessians by introducing Inexact Block Cubic Newton (IBCN), a second-order block coordinate method with a greedy Gauss-Southwell block selection rule. It combines a cubic model on a chosen block with inexact minimizers, trust-region–style updates, and adaptive σ_k to achieve global convergence and favorable worst-case iteration bounds: O($ε^{-3/2}$) to reduce block-stationarity and O($ε^{-2}$) for full stationarity, improving over prior cyclic-type results. Numerical experiments on sparse least-squares and regularized logistic regression demonstrate that IBCN often outperforms cyclic and random block updates, particularly for larger block sizes and when Hessian information can be effectively utilized. The method does not require the Hessian Lipschitz constant, accommodates changing block sizes, and comes with public code, highlighting practical impact for nonconvex and large-scale problems.

Abstract

A second-order block coordinate descent method is proposed for the unconstrained minimization of an objective function with a Lipschitz continuous Hessian. At each iteration, a block of variables is selected by means of a greedy (Gauss-Southwell) rule which considers the amount of first-order stationarity violation, then an approximate minimizer of a cubic model is computed for the block update. In the proposed scheme, blocks are not required to have a predetermined structure and their size may change during the iterations. For non-convex objective functions, global convergence to stationary points is proved and a worst-case iteration complexity analysis is provided. In particular, given a tolerance $ε$, we show that at most ${\cal O(ε^{-3/2})}$ iterations are needed to drive the stationarity violation with respect to a selected block of variables below $ε$, while at most ${\cal O(ε^{-2})}$ iterations are needed to drive the stationarity violation with respect to all variables below $ε$. Numerical results are finally given, comparing the proposed approach with other second-order methods and block selection rules.

Block cubic Newton with greedy selection

TL;DR

) to reduce block-stationarity and O(

) for full stationarity, improving over prior cyclic-type results. Numerical experiments on sparse least-squares and regularized logistic regression demonstrate that IBCN often outperforms cyclic and random block updates, particularly for larger block sizes and when Hessian information can be effectively utilized. The method does not require the Hessian Lipschitz constant, accommodates changing block sizes, and comes with public code, highlighting practical impact for nonconvex and large-scale problems.

Abstract

, we show that at most

iterations are needed to drive the stationarity violation with respect to a selected block of variables below

, while at most

iterations are needed to drive the stationarity violation with respect to all variables below

. Numerical results are finally given, comparing the proposed approach with other second-order methods and block selection rules.

Paper Structure (20 sections, 14 theorems, 84 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 14 theorems, 84 equations, 3 figures, 1 table, 1 algorithm.

Introduction
Main contributions
Preliminaries and notations
The Inexact Block Cubic Newton (IBCN) method
Block selection
Cubic model
Approximate minimizers of the cubic model
Block update
Convergence analysis
Global convergence
Worst-case iteration complexity
Numerical experiments
Comparison with other block updates
Sparse least squares
Regularized logistic regression
...and 5 more sections

Key Result

Proposition 1

Given a point $x \in \mathbb R^n$ and a block of variable indices $\mathcal{I} \subseteq \{1,\ldots,n\}$, for all $s \in \mathbb \mathbb R^{|\mathcal{I}|}$ we have that

Figures (3)

Figure 1: Results on sparse least squares using blocks of size $q$. In each plot, the $y$ axis is in logarithmic scale.
Figure 2: Results on $l_2$-regularized logistic regression with respect to the CPU time using blocks of size $q$ for gisette dataset. In each plot, the $y$ axis is in logarithmic scale.
Figure 3: Objective error vs CPU time for sparse least squares using blocks of size $q$. In each plot, the $y$ axis is in logarithmic scale.

Theorems & Definitions (31)

Proposition 1
Proposition 2
proof
Proposition 3
proof
Proposition 4
proof
Lemma 5
proof
Proposition 6
...and 21 more

Block cubic Newton with greedy selection

TL;DR

Abstract

Block cubic Newton with greedy selection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (31)