Table of Contents
Fetching ...

Uniformly Optimal and Parameter-free First-order Methods for Convex and Function-constrained Optimization

Qi Deng, Guanghui Lan, Zhenwei Lin

Abstract

This paper presents new first-order methods for achieving optimal oracle complexities in convex optimization with convex functional constraints. Oracle complexities are measured by the number of function and gradient evaluations. To achieve this, we enable first-order methods to utilize computational oracles for solving diagonal quadratic programs in subproblems. For problems where the optimal value $f^*$ is known, such as those in overparameterized models and feasibility problems, we propose an accelerated first-order method that incorporates a modified Polyak step size and Nesterov's momentum. Notably, our method does not require knowledge of smoothness levels, Hölder continuity parameter of the gradient, or additional line search, yet achieves the optimal oracle complexity bound of $\mathcal{O}(\varepsilon^{-2/(1+3ρ)})$ under Hölder smoothness conditions. When $f^*$ is unknown, we reformulate the problem as finding the root of the optimal value function and develop inexact fixed-point iteration and secant method to compute $f^*$. These root-finding subproblems are solved inexactly using first-order methods to a specified relative accuracy. We employ the accelerated prox-level (APL) method, which is proven to be uniformly optimal for convex optimization with simple constraints. Our analysis demonstrates that APL-based level-set methods also achieve the optimal oracle complexity of $\mathcal{O}(\varepsilon^{-2/(1+3ρ)})$ for convex function-constrained optimization, without requiring knowledge of any problem-specific structures. Through experiments on various tasks, we demonstrate the advantages of our methods over existing approaches in function-constrained optimization.

Uniformly Optimal and Parameter-free First-order Methods for Convex and Function-constrained Optimization

Abstract

This paper presents new first-order methods for achieving optimal oracle complexities in convex optimization with convex functional constraints. Oracle complexities are measured by the number of function and gradient evaluations. To achieve this, we enable first-order methods to utilize computational oracles for solving diagonal quadratic programs in subproblems. For problems where the optimal value is known, such as those in overparameterized models and feasibility problems, we propose an accelerated first-order method that incorporates a modified Polyak step size and Nesterov's momentum. Notably, our method does not require knowledge of smoothness levels, Hölder continuity parameter of the gradient, or additional line search, yet achieves the optimal oracle complexity bound of under Hölder smoothness conditions. When is unknown, we reformulate the problem as finding the root of the optimal value function and develop inexact fixed-point iteration and secant method to compute . These root-finding subproblems are solved inexactly using first-order methods to a specified relative accuracy. We employ the accelerated prox-level (APL) method, which is proven to be uniformly optimal for convex optimization with simple constraints. Our analysis demonstrates that APL-based level-set methods also achieve the optimal oracle complexity of for convex function-constrained optimization, without requiring knowledge of any problem-specific structures. Through experiments on various tasks, we demonstrate the advantages of our methods over existing approaches in function-constrained optimization.

Paper Structure

This paper contains 35 sections, 14 theorems, 70 equations, 5 figures, 4 tables, 9 algorithms.

Key Result

Theorem 2.1

Let $\mathbf{x}^{*}$ be an optimal solution of problem pb:func-constraint and $f^*=f(\mathbf{x}^*)$. Suppose $\mathbf{x}^*\in X_k$, $k=0,1,2,\ldots,$. Let us define the sequence $\{\Gamma_{k}\}_{k\ge1}$ by $\Gamma_{1}=1$ and $\Gamma_{k}={\Pi_{i=2}^{k}(1-\alpha_{i})^{-1}}$ for $k\ge 2$, and $\mathbf{ where $\hat{M}_{}=\max_{0\le i\le m}\sup_{\bar{\mathbf{x}},\hat{\mathbf{x}}\in\{\mathbf{x}:\|\mathb

Figures (5)

  • Figure 1: SOCP convergence. Left: 500 variables, 200 equality constraints, and 10 cones, each of dimension 50. Right: 1000 variables, 800 equality constraints, and 10 cones, each of dimension 100.
  • Figure 2: Convergence performance on LMI. Left: $(q,k) = (20,10)$; Right: $(q,k) = (40,20)$.
  • Figure 3: Gradient evaluations vs. $\beta$ for convex QCQP ($\varepsilon = 10^{-3}$). Left: $(m,n)=(10,1000)$; right: $(10,2000)$. IFP: Inexact Fixed Point; TIS: Truncated Inexact Secant.
  • Figure 4: Results on NPC. $y$-axis: $\max\{|f(\mathbf{x}_k)-f^*|,\{[g_i(\mathbf{x}_k)]_{+}\}_i\}$. $x$-axis: first panel uses gradient evaluations to meet the accuracy of $\leq 10^{-3}$ (our methods); the second and third panels report iALM outer iterations and total gradient evaluations, respectively, under a complementarity tolerance of $\leq 10^{-3}$.
  • Figure 5: Results on multi-class NPC. $y$-axis: $\max\{|f(\mathbf{x}_k)-f^*|,\{[g_i(\mathbf{x}_k)]_{+}\}_i\}$. $x$-axis: gradient evaluations for our methods to reach a tolerance of $\leq 10^{-3}$ and iALM to reach a complementarity tolerance of $\leq 10^{-3}$.

Theorems & Definitions (23)

  • Theorem 2.1
  • Remark 2.1
  • Theorem 2.2
  • Theorem 2.3
  • Proposition 3.1
  • Remark 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Corollary 3.4
  • Theorem 3.5
  • ...and 13 more