General Optimal Step-size for ADMM-type Algorithms: Domain Parametrization and Optimal Rates

Yifan Ran

General Optimal Step-size for ADMM-type Algorithms: Domain Parametrization and Optimal Rates

Yifan Ran

TL;DR

This work solves a 49-year open problem, the general optimal step-size for ADMM-type algorithms, and exhibits almost identical performance as the theoretical one (after a few iterations), similar to the underlying best fixed step-size (found by exhaustive grid search).

Abstract

In this work, we solve a 49-year open problem, the general optimal step-size for ADMM-type algorithms. For a convex program: $\text{min.} \,\, f({x}) + g({z})$, $\text{s.t.}\, {A}{x} - {B}{z} = {c} $, given an arbitrary fixed-point initialization $ ζ^0 $, an optimal step-size choice is given by a root of the following polynomial: \begin{equation*} ρ^4\Vert {A}{x}^\star\Vert^2 - ρ^3\langle {A}{x}^\star, ζ^0\rangle + ρ\langle λ^\star,ζ^0\rangle - \Vertλ^\star\Vert^2 = 0, \end{equation*} with $ ρ\neq 0 $ a domain step-size, which relates to the classical positive one via $ γ= ρ^2$. We denote by $ \cdot^\star $ the optimal solution, by $ λ $ the Lagrange multiplier associated with the equality constraint (dual variable). The above polynomial always admits a closed-form solution. The optimality is in the sense that a worst-case fixed-point convergence rate is minimized, which is a balance of the normalized primal and dual iterates convergence speed (reciprocally related). In cases where either the primal or dual solution is trivial (a zero vector), improvement can be made by accelerating the non-trivial sequence only. For practical use, adaptively replace the above optimal solutions with the current iterates, which are known at every iteration. Numerically, it exhibits almost identical performance as the theoretical one (after a few iterations), similar to the underlying best fixed step-size (found by exhaustive grid search).

General Optimal Step-size for ADMM-type Algorithms: Domain Parametrization and Optimal Rates

TL;DR

Abstract

In this work, we solve a 49-year open problem, the general optimal step-size for ADMM-type algorithms. For a convex program:

, given an arbitrary fixed-point initialization

, an optimal step-size choice is given by a root of the following polynomial: \begin{equation*} ρ^4\Vert {A}{x}^\star\Vert^2 - ρ^3\langle {A}{x}^\star, ζ^0\rangle + ρ\langle λ^\star,ζ^0\rangle - \Vertλ^\star\Vert^2 = 0, \end{equation*} with

a domain step-size, which relates to the classical positive one via

. We denote by

the optimal solution, by

the Lagrange multiplier associated with the equality constraint (dual variable). The above polynomial always admits a closed-form solution. The optimality is in the sense that a worst-case fixed-point convergence rate is minimized, which is a balance of the normalized primal and dual iterates convergence speed (reciprocally related). In cases where either the primal or dual solution is trivial (a zero vector), improvement can be made by accelerating the non-trivial sequence only. For practical use, adaptively replace the above optimal solutions with the current iterates, which are known at every iteration. Numerically, it exhibits almost identical performance as the theoretical one (after a few iterations), similar to the underlying best fixed step-size (found by exhaustive grid search).

Paper Structure (62 sections, 29 theorems, 179 equations, 11 figures, 3 algorithms)

This paper contains 62 sections, 29 theorems, 179 equations, 11 figures, 3 algorithms.

Introduction
Notations
ADMM algorithm
Key results
open problem (theoretical)
practical use
unscaled fixed-point
Some optimal rates
Organization
Two parametrizations: range and domain types
Fundamental ground
Classical range parametrization
Domain parametrization
Extra scaling
parallelism
...and 47 more sections

Key Result

Proposition 2.1

Given $f \in \Gamma_0 (\mathbb H)$, $\text{dom}(f) \neq \emptyset$ and scalar $\rho \neq 0$, the domain-parametrized proximal operator is

Figures (11)

Figure 1: (Fixed step-size) Theoretical results, see Proposition \ref{['pro_non']}.
Figure 3: Full warm-start with $\epsilon_{\text{err}1} \sim \mathcal{N}(0, 10^{-3})$, $\epsilon_{\text{err}2} \sim \mathcal{N}(0, 10^{-1})$.
Figure 4: Partial warm-start with $\epsilon_{\text{err}1} \sim \mathcal{N}(0, 1)$.
Figure 6: Fixed rate for primal iterate, independent of step-size choices.
Figure 7: (Adaptive) Primal warm-start: $\bm{x}^0 = \bm{x}^\star + \epsilon_{\text{err}1}, \, \bm{\lambda}^0 = \bm{0}$.
...and 6 more figures

Theorems & Definitions (57)

Definition 1.1
Definition 2.1
Definition 2.2
Proposition 2.1
proof
Lemma 2.1: translation rule
proof
Lemma 2.2
proof
Definition 2.3
...and 47 more

General Optimal Step-size for ADMM-type Algorithms: Domain Parametrization and Optimal Rates

TL;DR

Abstract

General Optimal Step-size for ADMM-type Algorithms: Domain Parametrization and Optimal Rates

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (57)