Table of Contents
Fetching ...

Interpolation Conditions for Linear Operators and Applications to Performance Estimation Problems

Nizar Bousselmi, Julien M. Hendrickx, François Glineur

TL;DR

This work generalizes the Performance Estimation Problem framework to first-order methods involving linear operators, and obtains new exact worst-case convergence rates for several performance criteria, including average and last iterate accuracy.

Abstract

The Performance Estimation Problem methodology makes it possible to determine the exact worst-case performance of an optimization method. In this work, we generalize this framework to first-order methods involving linear operators. This extension requires an explicit formulation of interpolation conditions for those linear operators. We consider the class of linear operators $\mathcal{M}:x \mapsto Mx$ where matrix $M$ has bounded singular values, and the class of linear operators where $M$ is symmetric and has bounded eigenvalues. We describe interpolation conditions for these classes, i.e. necessary and sufficient conditions that, given a list of pairs $\{(x_i,y_i)\}$, characterize the existence of a linear operator mapping $x_i$ to $y_i$ for all $i$. Using these conditions, we first identify the exact worst-case behavior of the gradient method applied to the composed objective $h\circ \mathcal{M}$, and observe that it always corresponds to $\mathcal{M}$ being a scaling operator. We then investigate the Chambolle-Pock method applied to $f+g\circ \mathcal{M}$, and improve the existing analysis to obtain a proof of the exact convergence rate of the primal-dual gap. In addition, we study how this method behaves on Lipschitz convex functions, and obtain a numerical convergence rate for the primal accuracy of the last iterate. We also show numerically that averaging iterates is beneficial in this setting.

Interpolation Conditions for Linear Operators and Applications to Performance Estimation Problems

TL;DR

This work generalizes the Performance Estimation Problem framework to first-order methods involving linear operators, and obtains new exact worst-case convergence rates for several performance criteria, including average and last iterate accuracy.

Abstract

The Performance Estimation Problem methodology makes it possible to determine the exact worst-case performance of an optimization method. In this work, we generalize this framework to first-order methods involving linear operators. This extension requires an explicit formulation of interpolation conditions for those linear operators. We consider the class of linear operators where matrix has bounded singular values, and the class of linear operators where is symmetric and has bounded eigenvalues. We describe interpolation conditions for these classes, i.e. necessary and sufficient conditions that, given a list of pairs , characterize the existence of a linear operator mapping to for all . Using these conditions, we first identify the exact worst-case behavior of the gradient method applied to the composed objective , and observe that it always corresponds to being a scaling operator. We then investigate the Chambolle-Pock method applied to , and improve the existing analysis to obtain a proof of the exact convergence rate of the primal-dual gap. In addition, we study how this method behaves on Lipschitz convex functions, and obtain a numerical convergence rate for the primal accuracy of the last iterate. We also show numerically that averaging iterates is beneficial in this setting.
Paper Structure (30 sections, 21 theorems, 62 equations, 4 figures, 1 table)

This paper contains 30 sections, 21 theorems, 62 equations, 4 figures, 1 table.

Key Result

Theorem 2.2

Let $0 \leq \mu < L$ and consider the class $\mathcal{F}_{\mu,L}$. The set of triplets $\{(x_i,g_i,f_i)\}_{i\in [N]}$ is $\mathcal{F}_{\mu,L}$-interpolable if, and only if, $\forall (i,j)\in [N]^2$

Figures (4)

  • Figure 1: Worst-case performance of 10 iterations of \ref{['eq:GM']} for varying step size $h\in[0,2]$ on classes $\mathcal{F}_{0}$ of $1$-smooth convex functions $f$ (dotted red line), $\mathcal{C}_{0.1}^{0}$ of $1$-smooth $0.1$-strongly convex functions $g \circ M$ (solid blue line) and $\mathcal{F}_{0.1}$ of $1$-smooth $0.1$-strongly convex functions $f$ (broken black line).
  • Figure 2: Worst-case performance obtained by our extension of PEP for $N$ iterations of the Chambolle-Pock algorithm \ref{['eq:(CP)']} with step size $\tau\sigma L_M^2 \leq 1$ on the problem $\min_x F(x)$ where $F = f + g \circ M$, $f$ and $g$ are convex proximable and $M$ is such that $0\leq ||M||\leq 1$. The performance criterion is the primal-dual gap $\mathcal{L}(\bar{x}_N,u) - \mathcal{L}(x,\bar{u}_N)$ and the initial distance is $R^2 = \frac{\left\lVert x-x_0\right\rVert^2}{\tau} + \frac{\left\lVert u-u_0\right\rVert^2}{\sigma} - 2 (u-u_0)^TM(x-x_0)$. PEP results (blue dots) are compared to bound \ref{['eq:th_CP16']} of Theorem \ref{['th:CP16']} (red line).
  • Figure 3: Worst-case performance obtained by our extension of PEP for $N$ iterations of the Chambolle-Pock algorithm \ref{['eq:(CP)']} with different step sizes $\tau =\sigma$ on the problem $\min_x F(x)$ where $F = f + g \circ M$, $f$ and $g$ are convex proximable and $M$ is such that $0\leq ||M||\leq 1$. The performance criterion is the primal-dual gap $\mathcal{L}(\bar{x}_N,u) - \mathcal{L}(x,\bar{u}_N)$ and the initial distance is $R_0^2 = \left\lVert x-x_0\right\rVert^2 + \left\lVert u-u_0\right\rVert^2$.
  • Figure 4: Worst-case performance obtained by our extension of PEP for $N$ iterations of Chambolle-Pock algorithm \ref{['eq:(CP)']} with step size parameters $\tau =\sigma = 1$ on the problem $\min_x F(x)$ where $F = f + g \circ M$, $f$ and $g$ are 1-Lipschitz convex proximable, and $M$ is such that $0\leq ||M||\leq 1$. The performance criterion is the objective function accuracy of the average (blue dots), last (red squares), best (green dots), last $\frac{N}{2}$ (magenta dots), and weighted sum (black dots) of iterates. Curves $\frac{5}{N}$ (solid black line) and $\frac{1}{\sqrt{N}}$ (solid black dashed line) are also represented for comparison purposes.

Theorems & Definitions (41)

  • Definition 2.1: $\mathcal{F}$-interpolability
  • Theorem 2.2: taylor2017smooth, Theorem 4
  • Definition 2.3: $\mathcal{L}_{L}$-interpolability
  • Definition 2.4: $\mathcal{S}_{\mu,L}$-interpolability
  • Definition 2.5: $\mathcal{T}_{L}$-interpolability
  • Theorem 3.1: $\mathcal{L}_{L}$-interpolation conditions
  • Corollary 3.2: $\mathcal{T}_{L}$-interpolation conditions
  • Theorem 3.3: $\mathcal{S}_{\mu,L}$-interpolation conditions
  • Lemma 3.4: Existence of $\mathcal{L}_{L}$-interpolable factorizations of Gram matrices
  • Proof 1
  • ...and 31 more