Table of Contents
Fetching ...

Efficiency of stochastic coordinate proximal gradient methods on nonseparable composite optimization

I. Necoara, F. Chorobura

TL;DR

A probabilistic worst case complexity analysis is presented for the stochastic coordinate proximal gradient method in convex and nonconvex settings and it is proved high-probability bounds on the number of iterations before a given optimality is achieved.

Abstract

This paper deals with composite optimization problems having the objective function formed as the sum of two terms, one has Lipschitz continuous gradient along random subspaces and may be nonconvex and the second term is simple and differentiable, but possibly nonconvex and nonseparable. Under these settings we design a stochastic coordinate proximal gradient method which takes into account the nonseparable composite form of the objective function. This algorithm achieves scalability by constructing at each iteration a local approximation model of the whole nonseparable objective function along a random subspace with user-determined dimension. We outline efficient techniques for selecting the random subspace, yielding an implementation that has low cost per-iteration while also achieving fast convergence rates. We present a probabilistic worst-case complexity analysis for our stochastic coordinate proximal gradient method in convex and nonconvex settings, in particular we prove high-probability bounds on the number of iterations before a given optimality is achieved. Extensive numerical results also confirm the efficiency of our algorithm.

Efficiency of stochastic coordinate proximal gradient methods on nonseparable composite optimization

TL;DR

A probabilistic worst case complexity analysis is presented for the stochastic coordinate proximal gradient method in convex and nonconvex settings and it is proved high-probability bounds on the number of iterations before a given optimality is achieved.

Abstract

This paper deals with composite optimization problems having the objective function formed as the sum of two terms, one has Lipschitz continuous gradient along random subspaces and may be nonconvex and the second term is simple and differentiable, but possibly nonconvex and nonseparable. Under these settings we design a stochastic coordinate proximal gradient method which takes into account the nonseparable composite form of the objective function. This algorithm achieves scalability by constructing at each iteration a local approximation model of the whole nonseparable objective function along a random subspace with user-determined dimension. We outline efficient techniques for selecting the random subspace, yielding an implementation that has low cost per-iteration while also achieving fast convergence rates. We present a probabilistic worst-case complexity analysis for our stochastic coordinate proximal gradient method in convex and nonconvex settings, in particular we prove high-probability bounds on the number of iterations before a given optimality is achieved. Extensive numerical results also confirm the efficiency of our algorithm.

Paper Structure

This paper contains 15 sections, 11 theorems, 154 equations, 3 figures, 2 tables.

Key Result

Lemma 1

If Assumption ass2 [A1] holds, then we have the relation:

Figures (3)

  • Figure 1: Comparisson between SCPG and power method for the smallest eigenvalue on group SchenkIBMNA. Matrices: c-50 (left) n=22401 and c-54 (right) n=31793.
  • Figure 2: Comparisson between SCPG and power method for the smalles eigenvalue on group SchenkIBMNA. Matrices: c-61 (left) n=43618 and c-65 (right) n=48066
  • Figure 3: Solving logistic regression problem using duke-breast-cancer dataset with cubic Newton using SCPG, algorithm in CarDuc:17 and RCGD with Armijo line-search in Bon:21 as subroutines: regularization parameter $\lambda$ is $0.01$ (left) and $0.0001$ (right), and three values for $p$.

Theorems & Definitions (18)

  • Lemma 1
  • Lemma 2
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • ...and 8 more