Table of Contents
Fetching ...

A Near-Optimal Total Complexity for the Inexact Accelerated Proximal Gradient Method via Quadratic Growth

Hongda Li, Xianfu Wang

Abstract

We consider the optimization problem $\min_{x\in \mathbb R^n}{F(x):=f(x)+ω(Ax)}$, where $f$ is an $L$-Lipschitz smooth function, and $ω$ is a proper, lower semicontinuous, and convex function. We prove in this paper that when $ω$ is a conic polyhedral function, the inexact accelerated proximal gradient method (IAPG), employed in a double-loop structure, achieves a total complexity of $\mathcal O(\ln(1/\varepsilon)/\sqrt{\varepsilon})$ measured by the total number of calls to the proximal operator of the convex conjugate $ω^\star$ and the gradient of $f$ to achieve $\varepsilon$-optimality in function value. To the best of our knowledge, this improves upon the best-known complexity for IAPG. The key theoretical ingredient is a quadratic growth condition on the dual of the inexact proximal problem, which arises from the conic polyhedral structure of $ω$ and implies linear convergence of the inner proximal gradient loop. To validate these findings, we conduct numerical experiments on a robust TV-$\ell_2$ signal recovery problem, demonstrating fast convergence.

A Near-Optimal Total Complexity for the Inexact Accelerated Proximal Gradient Method via Quadratic Growth

Abstract

We consider the optimization problem , where is an -Lipschitz smooth function, and is a proper, lower semicontinuous, and convex function. We prove in this paper that when is a conic polyhedral function, the inexact accelerated proximal gradient method (IAPG), employed in a double-loop structure, achieves a total complexity of measured by the total number of calls to the proximal operator of the convex conjugate and the gradient of to achieve -optimality in function value. To the best of our knowledge, this improves upon the best-known complexity for IAPG. The key theoretical ingredient is a quadratic growth condition on the dual of the inexact proximal problem, which arises from the conic polyhedral structure of and implies linear convergence of the inner proximal gradient loop. To validate these findings, we conduct numerical experiments on a robust TV- signal recovery problem, demonstrating fast convergence.

Paper Structure

This paper contains 32 sections, 28 theorems, 129 equations, 4 figures, 1 algorithm.

Key Result

Lemma 2.7

Let $g: \mathbb R^n \rightarrow \overline \mathbb R$ be a closed, convex and proper function. It has the equivalence $\blacktriangleleft$$\blacktriangleleft$

Figures (4)

  • Figure 1: Five-number summary of the smallest inner loop iteration $j$ such that $\mathbf G_\lambda(z_j, v_j) \le \epsilon^\circ_i$, plotted against $\epsilon^\circ_i$. The linear growth of $j$ with $-\log_2(\epsilon_i^\circ)$ confirms the $\mathcal{O}(\ln(\epsilon^{-1}))$ bound.
  • Figure 2: Comparing the observed recovered signal with the observed signal $\tilde{x}$ and ground truth signal $\bar{x}$.
  • Figure 3: (a): The model for the reference line is $y = a + b\ln(\epsilon_k^\circ)$ and the fitted values are: $a\approx -8.12\times 10^4, b\approx -9.63\times 10^4$. (b): The model for the reference line is $y = a + b \ln(k)$. The values are $a \approx -1.39\times 10^{5}, b \approx 3.96 \times 10^{4}$.
  • Figure 4: (a): The model we fitted for the reference line is $y = \frac{c\max(1, \log(c_1, x)^{a})}{\max(c_1, x)^{b}}$$\Vert x_k - y_k\Vert$ is on the y-axis, $\sum_{i = 0}^{k}J_i$ is on the x-axis. The best fitted value is $c \approx 2.41\times 10^{5}, c_1\approx 6.72\times 10^{5}, a\approx 6.89, b\approx 2.33$. (b): The figure shows relative error $\frac{\rho_k}{2}\Vert x_k - y_k\Vert^2$, and absolute error $\epsilon_k^\circ$, illustrating that relative error is relatively bigger than absolute error for the choice $\rho_k = B_k$.

Theorems & Definitions (93)

  • Definition 2.1: $\epsilon$-subgradient zalinescu_convex_2002
  • Remark 2.2
  • Definition 2.4: The Inexact proximal operator
  • Remark 2.5
  • Lemma 2.7: inexact Moreau decomposition
  • proof
  • Definition 2.8: Bregman Divergence of a differentiable function
  • Remark 2.9
  • Definition 2.10: Lipschitz smoothness
  • Remark 2.11
  • ...and 83 more