Solving Convex Smooth Function Constrained Optimization Is Almost As Easy As Unconstrained Optimization
Zhe Zhang, Guanghui Lan
TL;DR
This work tackles smooth function-constrained optimization where the objective is $F(x)=f(x)+u(x)$ over a convex set with inequality constraints $g(x)\le 0$. It introduces the Accelerated Constrained Gradient Descent (ACGD), a single-loop method that replaces the standard descent step with a constrained descent derived from a nested Lagrangian, and extends it to ACGD-S for large-scale problems using a sliding technique. The authors establish matching lower bounds and provide adaptive, parameter-free variants with verifiable certificates (FP-gap and PD-gap) to automatically tune Lipschitz-related parameters, achieving near-optimal oracle and computation complexities. Together, these results offer a near-complete characterization of the hardness of smooth function-constrained optimization and demonstrate practical viability for high-dimensional problems with many constraints.
Abstract
While Nesterov's Accelerated Gradient Descent (AGD) efficiently solves constrained problems when the constraint set $X \subseteq \mathbb{R}^n$ is simple and easy to project onto, it remains an open question whether function-constrained problems $\min_{x \in X} \{F(x) : g(x) \leq 0\}$ can be solved as efficiently as unconstrained problems in terms of oracle complexity. We provide an affirmative answer by proposing the Accelerated Constrained Gradient Descent (ACGD) method, a single-loop algorithm that modifies AGD by replacing the descent step with a constrained descent step, adding only a few linear constraints to the prox mapping. ACGD achieves nearly the same oracle complexity as minimizing the optimal Lagrangian function (with the multiplier fixed at its optimal value). We establish matching lower bounds, demonstrating these complexity results are unimprovable. For large-scale problems with many constraints, we introduce ACGD-S, which replaces the computationally demanding constrained descent step with basic matrix-vector multiplications, maintaining optimal oracle and computation complexities. Together, these methods provide a nearly complete characterization of the hardness of smooth function-constrained optimization. We also propose parameter-free adaptive versions that achieve optimal oracle complexity (requiring only the strong convexity modulus) and present encouraging numerical results demonstrating their efficiency.
