Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization

Benjamin Dubois-Taine; Francis Bach; Quentin Berthet; Adrien Taylor

Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization

Benjamin Dubois-Taine, Francis Bach, Quentin Berthet, Adrien Taylor

TL;DR

It is shown that when minimizing a smooth convex function on a bounded domain, one can achieve an $\epsilon$ primal-dual gap (in expectation) in $\tilde{O}(1/ \sqrt{\ep silon})$ iterations, by only accessing gradients of the original function and a linear maximization oracle with $O( 1/\sqrt{O})$ computing units in parallel.

Abstract

We consider the problem of minimizing the sum of two convex functions. One of those functions has Lipschitz-continuous gradients, and can be accessed via stochastic oracles, whereas the other is "simple". We provide a Bregman-type algorithm with accelerated convergence in function values to a ball containing the minimum. The radius of this ball depends on problem-dependent constants, including the variance of the stochastic oracle. We further show that this algorithmic setup naturally leads to a variant of Frank-Wolfe achieving acceleration under parallelization. More precisely, when minimizing a smooth convex function on a bounded domain, we show that one can achieve an $ε$ primal-dual gap (in expectation) in $\tilde{O}(1/ \sqrtε)$ iterations, by only accessing gradients of the original function and a linear maximization oracle with $O(1/\sqrtε)$ computing units in parallel. We illustrate this fast convergence on synthetic numerical experiments.

Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization

TL;DR

It is shown that when minimizing a smooth convex function on a bounded domain, one can achieve an

primal-dual gap (in expectation) in

iterations, by only accessing gradients of the original function and a linear maximization oracle with

computing units in parallel.

Abstract

primal-dual gap (in expectation) in

iterations, by only accessing gradients of the original function and a linear maximization oracle with

computing units in parallel. We illustrate this fast convergence on synthetic numerical experiments.

Paper Structure (23 sections, 9 theorems, 101 equations, 2 figures, 3 algorithms)

This paper contains 23 sections, 9 theorems, 101 equations, 2 figures, 3 algorithms.

Introduction
Related Work
Notation and definitions
Fast Stochastic Composite Minimization
Accelerating Frank-Wolfe with parallelization
Experiments
Conclusion
Acknowledgements
Proofs for Stochastic Composite Minimization
Proof of
Proof of
Proofs for Accelerated Frank-Wolfe
Proof of
Proof of feasibility
Proof of dual gap convergence
...and 8 more sections

Key Result

Proposition 1

$\Psi : \mathbf{V} \rightarrow \mathbb{R}$ is $L$-smooth w.r.t. $\left\|\cdot\right\|$ if and only if for all $x, y \in \mathbf{V}$,

Figures (2)

Figure 1: Comparisons between the behavior of \ref{['alg:AFW']} and that of its theoretical upper bound (see \ref{['thm:AFW-result']}) on a least-squares problem on the simplex for $\alpha = 10^{-2}$ (left) and $\alpha = 10^{-3}$ (right). The plots report the value of the best primal-dual gap incurred at the current iteration.
Figure 2: Comparisons between Frank-Wolfe and the restarting scheme (\ref{['alg:R-PFW']}): a least-squares problem on the simplex ((a) and (b)), and a generalized matrix completion problem on the trace ball ((c) and (d)). The plots report the value of the best primal-dual gap incurred at the current iteration.

Theorems & Definitions (22)

Definition 1
Proposition 1
Definition 2
Definition 3
Definition 4
Definition 5
Proposition 2
Proposition 3
Theorem 1
Remark 1
...and 12 more

Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization

TL;DR

Abstract

Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (22)