Table of Contents
Fetching ...

Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization

Benjamin Dubois-Taine, Francis Bach, Quentin Berthet, Adrien Taylor

TL;DR

It is shown that when minimizing a smooth convex function on a bounded domain, one can achieve an $\epsilon$ primal-dual gap (in expectation) in $\tilde{O}(1/ \sqrt{\ep silon})$ iterations, by only accessing gradients of the original function and a linear maximization oracle with $O( 1/\sqrt{O})$ computing units in parallel.

Abstract

We consider the problem of minimizing the sum of two convex functions. One of those functions has Lipschitz-continuous gradients, and can be accessed via stochastic oracles, whereas the other is "simple". We provide a Bregman-type algorithm with accelerated convergence in function values to a ball containing the minimum. The radius of this ball depends on problem-dependent constants, including the variance of the stochastic oracle. We further show that this algorithmic setup naturally leads to a variant of Frank-Wolfe achieving acceleration under parallelization. More precisely, when minimizing a smooth convex function on a bounded domain, we show that one can achieve an $ε$ primal-dual gap (in expectation) in $\tilde{O}(1/ \sqrtε)$ iterations, by only accessing gradients of the original function and a linear maximization oracle with $O(1/\sqrtε)$ computing units in parallel. We illustrate this fast convergence on synthetic numerical experiments.

Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization

TL;DR

It is shown that when minimizing a smooth convex function on a bounded domain, one can achieve an primal-dual gap (in expectation) in iterations, by only accessing gradients of the original function and a linear maximization oracle with computing units in parallel.

Abstract

We consider the problem of minimizing the sum of two convex functions. One of those functions has Lipschitz-continuous gradients, and can be accessed via stochastic oracles, whereas the other is "simple". We provide a Bregman-type algorithm with accelerated convergence in function values to a ball containing the minimum. The radius of this ball depends on problem-dependent constants, including the variance of the stochastic oracle. We further show that this algorithmic setup naturally leads to a variant of Frank-Wolfe achieving acceleration under parallelization. More precisely, when minimizing a smooth convex function on a bounded domain, we show that one can achieve an primal-dual gap (in expectation) in iterations, by only accessing gradients of the original function and a linear maximization oracle with computing units in parallel. We illustrate this fast convergence on synthetic numerical experiments.
Paper Structure (23 sections, 9 theorems, 101 equations, 2 figures, 3 algorithms)

This paper contains 23 sections, 9 theorems, 101 equations, 2 figures, 3 algorithms.

Key Result

Proposition 1

$\Psi : \mathbf{V} \rightarrow \mathbb{R}$ is $L$-smooth w.r.t. $\left\|\cdot\right\|$ if and only if for all $x, y \in \mathbf{V}$,

Figures (2)

  • Figure 1: Comparisons between the behavior of \ref{['alg:AFW']} and that of its theoretical upper bound (see \ref{['thm:AFW-result']}) on a least-squares problem on the simplex for $\alpha = 10^{-2}$ (left) and $\alpha = 10^{-3}$ (right). The plots report the value of the best primal-dual gap incurred at the current iteration.
  • Figure 2: Comparisons between Frank-Wolfe and the restarting scheme (\ref{['alg:R-PFW']}): a least-squares problem on the simplex ((a) and (b)), and a generalized matrix completion problem on the trace ball ((c) and (d)). The plots report the value of the best primal-dual gap incurred at the current iteration.

Theorems & Definitions (22)

  • Definition 1
  • Proposition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Proposition 2
  • Proposition 3
  • Theorem 1
  • Remark 1
  • ...and 12 more