Interpolation Conditions for Linear Operators and Applications to Performance Estimation Problems

Nizar Bousselmi; Julien M. Hendrickx; François Glineur

Interpolation Conditions for Linear Operators and Applications to Performance Estimation Problems

Nizar Bousselmi, Julien M. Hendrickx, François Glineur

TL;DR

This work generalizes the Performance Estimation Problem framework to first-order methods involving linear operators, and obtains new exact worst-case convergence rates for several performance criteria, including average and last iterate accuracy.

Abstract

The Performance Estimation Problem methodology makes it possible to determine the exact worst-case performance of an optimization method. In this work, we generalize this framework to first-order methods involving linear operators. This extension requires an explicit formulation of interpolation conditions for those linear operators. We consider the class of linear operators $\mathcal{M}:x \mapsto Mx$ where matrix $M$ has bounded singular values, and the class of linear operators where $M$ is symmetric and has bounded eigenvalues. We describe interpolation conditions for these classes, i.e. necessary and sufficient conditions that, given a list of pairs $\{(x_i,y_i)\}$, characterize the existence of a linear operator mapping $x_i$ to $y_i$ for all $i$. Using these conditions, we first identify the exact worst-case behavior of the gradient method applied to the composed objective $h\circ \mathcal{M}$, and observe that it always corresponds to $\mathcal{M}$ being a scaling operator. We then investigate the Chambolle-Pock method applied to $f+g\circ \mathcal{M}$, and improve the existing analysis to obtain a proof of the exact convergence rate of the primal-dual gap. In addition, we study how this method behaves on Lipschitz convex functions, and obtain a numerical convergence rate for the primal accuracy of the last iterate. We also show numerically that averaging iterates is beneficial in this setting.

Interpolation Conditions for Linear Operators and Applications to Performance Estimation Problems

TL;DR

Abstract

where matrix

has bounded singular values, and the class of linear operators where

is symmetric and has bounded eigenvalues. We describe interpolation conditions for these classes, i.e. necessary and sufficient conditions that, given a list of pairs

, characterize the existence of a linear operator mapping

for all

. Using these conditions, we first identify the exact worst-case behavior of the gradient method applied to the composed objective

, and observe that it always corresponds to

being a scaling operator. We then investigate the Chambolle-Pock method applied to

, and improve the existing analysis to obtain a proof of the exact convergence rate of the primal-dual gap. In addition, we study how this method behaves on Lipschitz convex functions, and obtain a numerical convergence rate for the primal accuracy of the last iterate. We also show numerically that averaging iterates is beneficial in this setting.

Paper Structure (30 sections, 21 theorems, 62 equations, 4 figures, 1 table)

This paper contains 30 sections, 21 theorems, 62 equations, 4 figures, 1 table.

Introduction
Optimization methods involving linear operators
Outline of the paper
Prior PEP work
PEP formulation
Interpolation conditions
Classes of linear operators
Interpolation conditions for linear operators
Main results
Proofs of the main results
$\mathcal{L}_{L}$-interpolability of $(X_R,Y_R,U_R,V_R)$ (Step 1)
Rotation to $(X,Y,U,V)$ (Step 2)
Proof of Theorem \ref{['th:int_cond_non_sym']}
Proof of Theorem \ref{['th:int_cond_sym']}
Limiting cases
...and 15 more sections

Key Result

Theorem 2.2

Let $0 \leq \mu < L$ and consider the class $\mathcal{F}_{\mu,L}$. The set of triplets $\{(x_i,g_i,f_i)\}_{i\in [N]}$ is $\mathcal{F}_{\mu,L}$-interpolable if, and only if, $\forall (i,j)\in [N]^2$

Figures (4)

Figure 1: Worst-case performance of 10 iterations of \ref{['eq:GM']} for varying step size $h\in[0,2]$ on classes $\mathcal{F}_{0}$ of $1$-smooth convex functions $f$ (dotted red line), $\mathcal{C}_{0.1}^{0}$ of $1$-smooth $0.1$-strongly convex functions $g \circ M$ (solid blue line) and $\mathcal{F}_{0.1}$ of $1$-smooth $0.1$-strongly convex functions $f$ (broken black line).
Figure 2: Worst-case performance obtained by our extension of PEP for $N$ iterations of the Chambolle-Pock algorithm \ref{['eq:(CP)']} with step size $\tau\sigma L_M^2 \leq 1$ on the problem $\min_x F(x)$ where $F = f + g \circ M$, $f$ and $g$ are convex proximable and $M$ is such that $0\leq ||M||\leq 1$. The performance criterion is the primal-dual gap $\mathcal{L}(\bar{x}_N,u) - \mathcal{L}(x,\bar{u}_N)$ and the initial distance is $R^2 = \frac{\left\lVert x-x_0\right\rVert^2}{\tau} + \frac{\left\lVert u-u_0\right\rVert^2}{\sigma} - 2 (u-u_0)^TM(x-x_0)$. PEP results (blue dots) are compared to bound \ref{['eq:th_CP16']} of Theorem \ref{['th:CP16']} (red line).
Figure 3: Worst-case performance obtained by our extension of PEP for $N$ iterations of the Chambolle-Pock algorithm \ref{['eq:(CP)']} with different step sizes $\tau =\sigma$ on the problem $\min_x F(x)$ where $F = f + g \circ M$, $f$ and $g$ are convex proximable and $M$ is such that $0\leq ||M||\leq 1$. The performance criterion is the primal-dual gap $\mathcal{L}(\bar{x}_N,u) - \mathcal{L}(x,\bar{u}_N)$ and the initial distance is $R_0^2 = \left\lVert x-x_0\right\rVert^2 + \left\lVert u-u_0\right\rVert^2$.
Figure 4: Worst-case performance obtained by our extension of PEP for $N$ iterations of Chambolle-Pock algorithm \ref{['eq:(CP)']} with step size parameters $\tau =\sigma = 1$ on the problem $\min_x F(x)$ where $F = f + g \circ M$, $f$ and $g$ are 1-Lipschitz convex proximable, and $M$ is such that $0\leq ||M||\leq 1$. The performance criterion is the objective function accuracy of the average (blue dots), last (red squares), best (green dots), last $\frac{N}{2}$ (magenta dots), and weighted sum (black dots) of iterates. Curves $\frac{5}{N}$ (solid black line) and $\frac{1}{\sqrt{N}}$ (solid black dashed line) are also represented for comparison purposes.

Theorems & Definitions (41)

Definition 2.1: $\mathcal{F}$-interpolability
Theorem 2.2: taylor2017smooth, Theorem 4
Definition 2.3: $\mathcal{L}_{L}$-interpolability
Definition 2.4: $\mathcal{S}_{\mu,L}$-interpolability
Definition 2.5: $\mathcal{T}_{L}$-interpolability
Theorem 3.1: $\mathcal{L}_{L}$-interpolation conditions
Corollary 3.2: $\mathcal{T}_{L}$-interpolation conditions
Theorem 3.3: $\mathcal{S}_{\mu,L}$-interpolation conditions
Lemma 3.4: Existence of $\mathcal{L}_{L}$-interpolable factorizations of Gram matrices
Proof 1
...and 31 more

Interpolation Conditions for Linear Operators and Applications to Performance Estimation Problems

TL;DR

Abstract

Interpolation Conditions for Linear Operators and Applications to Performance Estimation Problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (41)