Table of Contents
Fetching ...

General Optimal Step-size for ADMM-type Algorithms: Domain Parametrization and Optimal Rates

Yifan Ran

TL;DR

This work solves a 49-year open problem, the general optimal step-size for ADMM-type algorithms, and exhibits almost identical performance as the theoretical one (after a few iterations), similar to the underlying best fixed step-size (found by exhaustive grid search).

Abstract

In this work, we solve a 49-year open problem, the general optimal step-size for ADMM-type algorithms. For a convex program: $\text{min.} \,\, f({x}) + g({z})$, $\text{s.t.}\, {A}{x} - {B}{z} = {c} $, given an arbitrary fixed-point initialization $ ζ^0 $, an optimal step-size choice is given by a root of the following polynomial: \begin{equation*} ρ^4\Vert {A}{x}^\star\Vert^2 - ρ^3\langle {A}{x}^\star, ζ^0\rangle + ρ\langle λ^\star,ζ^0\rangle - \Vertλ^\star\Vert^2 = 0, \end{equation*} with $ ρ\neq 0 $ a domain step-size, which relates to the classical positive one via $ γ= ρ^2$. We denote by $ \cdot^\star $ the optimal solution, by $ λ $ the Lagrange multiplier associated with the equality constraint (dual variable). The above polynomial always admits a closed-form solution. The optimality is in the sense that a worst-case fixed-point convergence rate is minimized, which is a balance of the normalized primal and dual iterates convergence speed (reciprocally related). In cases where either the primal or dual solution is trivial (a zero vector), improvement can be made by accelerating the non-trivial sequence only. For practical use, adaptively replace the above optimal solutions with the current iterates, which are known at every iteration. Numerically, it exhibits almost identical performance as the theoretical one (after a few iterations), similar to the underlying best fixed step-size (found by exhaustive grid search).

General Optimal Step-size for ADMM-type Algorithms: Domain Parametrization and Optimal Rates

TL;DR

This work solves a 49-year open problem, the general optimal step-size for ADMM-type algorithms, and exhibits almost identical performance as the theoretical one (after a few iterations), similar to the underlying best fixed step-size (found by exhaustive grid search).

Abstract

In this work, we solve a 49-year open problem, the general optimal step-size for ADMM-type algorithms. For a convex program: , , given an arbitrary fixed-point initialization , an optimal step-size choice is given by a root of the following polynomial: \begin{equation*} ρ^4\Vert {A}{x}^\star\Vert^2 - ρ^3\langle {A}{x}^\star, ζ^0\rangle + ρ\langle λ^\star,ζ^0\rangle - \Vertλ^\star\Vert^2 = 0, \end{equation*} with a domain step-size, which relates to the classical positive one via . We denote by the optimal solution, by the Lagrange multiplier associated with the equality constraint (dual variable). The above polynomial always admits a closed-form solution. The optimality is in the sense that a worst-case fixed-point convergence rate is minimized, which is a balance of the normalized primal and dual iterates convergence speed (reciprocally related). In cases where either the primal or dual solution is trivial (a zero vector), improvement can be made by accelerating the non-trivial sequence only. For practical use, adaptively replace the above optimal solutions with the current iterates, which are known at every iteration. Numerically, it exhibits almost identical performance as the theoretical one (after a few iterations), similar to the underlying best fixed step-size (found by exhaustive grid search).
Paper Structure (62 sections, 29 theorems, 179 equations, 11 figures, 3 algorithms)

This paper contains 62 sections, 29 theorems, 179 equations, 11 figures, 3 algorithms.

Key Result

Proposition 2.1

Given $f \in \Gamma_0 (\mathbb H)$, $\text{dom}(f) \neq \emptyset$ and scalar $\rho \neq 0$, the domain-parametrized proximal operator is

Figures (11)

  • Figure 1: (Fixed step-size) Theoretical results, see Proposition \ref{['pro_non']}.
  • Figure 3: Full warm-start with $\epsilon_{\text{err}1} \sim \mathcal{N}(0, 10^{-3})$, $\epsilon_{\text{err}2} \sim \mathcal{N}(0, 10^{-1})$.
  • Figure 4: Partial warm-start with $\epsilon_{\text{err}1} \sim \mathcal{N}(0, 1)$.
  • Figure 6: Fixed rate for primal iterate, independent of step-size choices.
  • Figure 7: (Adaptive) Primal warm-start: $\bm{x}^0 = \bm{x}^\star + \epsilon_{\text{err}1}, \, \bm{\lambda}^0 = \bm{0}$.
  • ...and 6 more figures

Theorems & Definitions (57)

  • Definition 1.1
  • Definition 2.1
  • Definition 2.2
  • Proposition 2.1
  • proof
  • Lemma 2.1: translation rule
  • proof
  • Lemma 2.2
  • proof
  • Definition 2.3
  • ...and 47 more