Table of Contents
Fetching ...

A Stochastic Block-coordinate Proximal Newton Method for Nonconvex Composite Minimization

Hong Zhu, Xun Qian

Abstract

This paper presents a stochastic block-coordinate proximal Newton method for minimizing the sum of a blockwise Lipschitz-continuously differentiable function and a separable nonsmooth convex function. At each iteration, the method randomly selects one block and approximately solves a strongly convex regularized quadratic subproblem built from a second-order local model of the smooth part of the objective function, with a backtracking line search to ensure monotonicity of the objective. Under mild sampling assumptions, we show that its convergence properties match those of the inexact proximal Newton method. We further develop a line-search-free variant, where the strongly convex regularized quadratic subproblem is constructed using the Lipschitz constant of the gradient of the smooth component. For this variant, under a suitable parameter setting, we establish the global convergence rate of the residual mapping as well as the superlinear convergence rate of the iterates under the metric \(q\)-subregularity property with \(q > 1\) of the residual mapping for nonconvex composite problems. Under a suitable parameter setting, a more restrictive condition on the Hessian approximation, and the Hölderian error bound condition (\(q\in(0, 1]\)) of the residual mapping, we also prove the local superlinear/quadratic convergence rate of both the residual mapping and the iterates for convex composite problems. Finally, numerical experiments are conducted to demonstrate the effectiveness and convergence behavior of the proposed algorithm.

A Stochastic Block-coordinate Proximal Newton Method for Nonconvex Composite Minimization

Abstract

This paper presents a stochastic block-coordinate proximal Newton method for minimizing the sum of a blockwise Lipschitz-continuously differentiable function and a separable nonsmooth convex function. At each iteration, the method randomly selects one block and approximately solves a strongly convex regularized quadratic subproblem built from a second-order local model of the smooth part of the objective function, with a backtracking line search to ensure monotonicity of the objective. Under mild sampling assumptions, we show that its convergence properties match those of the inexact proximal Newton method. We further develop a line-search-free variant, where the strongly convex regularized quadratic subproblem is constructed using the Lipschitz constant of the gradient of the smooth component. For this variant, under a suitable parameter setting, we establish the global convergence rate of the residual mapping as well as the superlinear convergence rate of the iterates under the metric -subregularity property with of the residual mapping for nonconvex composite problems. Under a suitable parameter setting, a more restrictive condition on the Hessian approximation, and the Hölderian error bound condition () of the residual mapping, we also prove the local superlinear/quadratic convergence rate of both the residual mapping and the iterates for convex composite problems. Finally, numerical experiments are conducted to demonstrate the effectiveness and convergence behavior of the proposed algorithm.

Paper Structure

This paper contains 16 sections, 15 theorems, 124 equations, 4 figures.

Key Result

Proposition 2

Under Assumption assume:ncp (ii), we have

Figures (4)

  • Figure 1: Average performance of SBCPNM under different samplings over $10$ trials. Top line: SBCPNM_cycr; the second line: SBCPNM_cycrd; Bottom two lines: SBCPNM_topk.
  • Figure 2: Average performance of SBCPNM under different samplings over $10$ trials on datasets rcv1_sel. The top line: SBCPNM_r; the second line: SBCPNM_topk; the third line: SBCPNM_topk with $\mathbf{k} = \{50\%n, 40\%n, 30\%n\}$ for selected data with $m = 240$; the bottom line: SBCPNM_topk for selected data with $m = 2000$.
  • Figure 3: Average performance of SBCPNM under different samplings over $10$ trials on datasets real_sim_sel. The top line: SBCPNM_r; the second line: SBCPNM_topk; the third line: SBCPNM_topk with $\mathbf{k} = \{50\%n, 40\%n, 30\%n\}$ for selected data with $m = 180$; the bottom line: SBCPNM_topk for selected data with $m = 1500$.
  • Figure 4: Performance of SBCPNM and VM on dataset real_sim.

Theorems & Definitions (34)

  • Proposition 2
  • Remark 1
  • Remark 2
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • Theorem 6
  • ...and 24 more