Table of Contents
Fetching ...

Faster Accelerated First-order Methods for Convex Optimization with Strongly Convex Function Constraints

Zhenwei Lin, Qi Deng

TL;DR

This paper introduces faster accelerated primal-dual algorithms for minimizing a convex function subject to strongly convex function constraints and effectively leverages the constraint strong convexity, obtaining an improved complexity of $\mathcal{O}(1/\sqrt{\varepsilon})$.

Abstract

In this paper, we introduce faster accelerated primal-dual algorithms for minimizing a convex function subject to strongly convex function constraints. Prior to our work, the best complexity bound was $\mathcal{O}(1/{\varepsilon})$, regardless of the strong convexity of the constraint function. It is unclear whether the strong convexity assumption can enable even better convergence results. To address this issue, we have developed novel techniques to progressively estimate the strong convexity of the Lagrangian function. Our approach, for the first time, effectively leverages the constraint strong convexity, obtaining an improved complexity of $\mathcal{O}(1/\sqrt{\varepsilon})$. This rate matches the complexity lower bound for strongly-convex-concave saddle point optimization and is therefore order-optimal. We show the superior performance of our methods in sparsity-inducing constrained optimization, notably Google's personalized PageRank problem. Furthermore, we show that a restarted version of the proposed methods can effectively identify the optimal solution's sparsity pattern within a finite number of steps, a result that appears to have independent significance.

Faster Accelerated First-order Methods for Convex Optimization with Strongly Convex Function Constraints

TL;DR

This paper introduces faster accelerated primal-dual algorithms for minimizing a convex function subject to strongly convex function constraints and effectively leverages the constraint strong convexity, obtaining an improved complexity of .

Abstract

In this paper, we introduce faster accelerated primal-dual algorithms for minimizing a convex function subject to strongly convex function constraints. Prior to our work, the best complexity bound was , regardless of the strong convexity of the constraint function. It is unclear whether the strong convexity assumption can enable even better convergence results. To address this issue, we have developed novel techniques to progressively estimate the strong convexity of the Lagrangian function. Our approach, for the first time, effectively leverages the constraint strong convexity, obtaining an improved complexity of . This rate matches the complexity lower bound for strongly-convex-concave saddle point optimization and is therefore order-optimal. We show the superior performance of our methods in sparsity-inducing constrained optimization, notably Google's personalized PageRank problem. Furthermore, we show that a restarted version of the proposed methods can effectively identify the optimal solution's sparsity pattern within a finite number of steps, a result that appears to have independent significance.
Paper Structure (34 sections, 17 theorems, 110 equations, 4 figures, 3 tables, 3 algorithms)

This paper contains 34 sections, 17 theorems, 110 equations, 4 figures, 3 tables, 3 algorithms.

Key Result

Proposition 1

Suppose Assumption assu:Slater's holds. Then, for any optimal solution $\mathbf{x}^*$ of problem eq:constrainedproblem, there exists $\mathbf{y}^*\in\mathbb{R}^m$ such that KKT condition holds. Moreover, $\mathbf{y}^*$ falls into set $\mathcal{Y}:=\{\mathbf{y}\mid \Vert\mathbf{y}\Vert_1\leq \bar{c}\

Figures (4)

  • Figure 1: The first row describes the convergence to optimum, where the $y$-axis reports $\log_{10}((\|D^{1/2}\mathbf{x}_{k}\|_{1}-\|D^{1/2}\mathbf{x}^{*}\|_{1})/\|D^{1/2}\mathbf{x}^{*}\|_{1})$ for rAPDPro, and $\log_{10}((\|D^{1/2}\bar{\mathbf{x}}_{k}\|_{1}-\|D^{1/2}\mathbf{x}^{*}\|_{1})/\| D^{1/2}\mathbf{x}^{*}\|_{1})$ for APD, APD+restart, msAPD and Mirror-Prox ($\mathbf{x}^{*}$ is computed by MOSEK aps2019mosek). The second row describes feasibility violation, where $y$-axis reports the feasibility gap $\log_{10}(\max\{0,G(\mathbf{x}_k)\})$ for rAPDPro, and $\log_{10}(\max\{0,G(\bar{\mathbf{x}}_k)\})$ for APD, msAPD and Mirror-Prox. Datasets (Left-Right order) correspond to bio-CE-HT, bio-CE-LC and econ-beaflw.
  • Figure 2: The experimental results on active-set identification. Datasets (Left-Right order) correspond to bio-CE-HT, bio-CE-LC and econ-beaflw. The $x$-axis reports the iteration number and the $y$-axis reports accuracy in active-set identification.
  • Figure 3: The first row is the results of objective convergence to optimum, where the $y$-axis reports $\log_{10}((\|D^{1/2}\mathbf{x}_{k}\|_{1}-\|D^{1/2}\mathbf{x}^{*}\|_{1})/\|D^{1/2}\mathbf{x}^{*}\|_{1})$ for rAPDPro, and $\log_{10}((\|D^{1/2}\bar{\mathbf{x}}_{k}\|_{1}-\|D^{1/2}\mathbf{x}^{*}\|_{1})/\| D^{1/2}\mathbf{x}^{*}\|_{1})$ for APD, msAPD and Mirror-Prox. The second row is the results of feasibility violation, where $y$-axis reports the feasibility gap $\log_{10}(\max\{0, G(\mathbf{x}_k)\})$ for rAPDPro, and $\log_{10}(\max\{0,G(\bar{\mathbf{x}}_k)\})$ for APD, APD+restart msAPD and Mirror-Prox. Datasets (Left-Right order) correspond to DD68, DD242 and peking-1.
  • Figure 4: The experimental results on active-set identification. Datasets (Left-Right order) correspond to DD68, DD242 and peking-1. The $x$-axis reports the iteration number and the $y$-axis reports accuracy in active-set identification.

Theorems & Definitions (29)

  • Definition 1: KKT condition
  • Remark 1
  • Proposition 1
  • Proposition 2
  • Remark 2
  • Proposition 3
  • Proposition 4
  • Theorem 1
  • Corollary 1
  • Remark 3
  • ...and 19 more