First-Order Methods for Nonsmooth Nonconvex Functional Constrained Optimization with or without Slater Points

Zhichao Jia; Benjamin Grimmer

First-Order Methods for Nonsmooth Nonconvex Functional Constrained Optimization with or without Slater Points

Zhichao Jia, Benjamin Grimmer

TL;DR

This work addresses constrained optimization with nonsmooth, nonconvex objectives and constraints on a closed convex set $X$, proposing a simple inexact proximal point method. The method uses an inner switching subgradient loop to solve proximal subproblems, guaranteeing feasibility and achieving $O(1/ε^4)$ subgradient evaluations to reach an $ε$-stationary point. It yields approximate Fritz–John points without constraint qualification and approximate KKT points under CQ (via a $ ext{σ}$-strong MFCQ bound on multipliers), without requiring compactness of the feasible set. The approach is demonstrated on sparsity-inducing SCAD constraints, showing realistic behavior under CQ failures and active/inactive constraint regimes, with potential extensions to stochastic settings and structure-aware acceleration. Overall, the paper advances first-order methods for nonsmooth nonconvex constrained optimization by guaranteeing feasibility and stationarity in broad settings, including when classical constraint qualifications fail.

Abstract

Constrained optimization problems where both the objective and constraints may be nonsmooth and nonconvex arise across many learning and data science settings. In this paper, we show for any Lipschitz, weakly convex objectives and constraints, a simple first-order method finds a feasible, $ε$-stationary point at a convergence rate of $O(ε^{-4})$ without relying on compactness or Constraint Qualification (CQ). When CQ holds, this convergence is measured by approximately satisfying the Karush-Kuhn-Tucker conditions. When CQ fails, we guarantee the attainment of weaker Fritz-John conditions. As an illustrative example, our method stably converges on piecewise quadratic SCAD regularized problems despite frequent violations of constraint qualification. The considered algorithm is similar to those of "Quadratically regularized subgradient methods for weakly convex optimization with weakly convex constraints" by Ma et al. and "Stochastic first-order methods for convex and nonconvex functional constrained optimization" by Boob et al. (whose guarantees further assume compactness and CQ), iteratively taking inexact proximal steps, computed via an inner loop applying a switching subgradient method to a strongly convex constrained subproblem. Our non-Lipschitz analysis of the switching subgradient method appears to be new and may be of independent interest.

First-Order Methods for Nonsmooth Nonconvex Functional Constrained Optimization with or without Slater Points

TL;DR

This work addresses constrained optimization with nonsmooth, nonconvex objectives and constraints on a closed convex set

, proposing a simple inexact proximal point method. The method uses an inner switching subgradient loop to solve proximal subproblems, guaranteeing feasibility and achieving

subgradient evaluations to reach an

-stationary point. It yields approximate Fritz–John points without constraint qualification and approximate KKT points under CQ (via a

-strong MFCQ bound on multipliers), without requiring compactness of the feasible set. The approach is demonstrated on sparsity-inducing SCAD constraints, showing realistic behavior under CQ failures and active/inactive constraint regimes, with potential extensions to stochastic settings and structure-aware acceleration. Overall, the paper advances first-order methods for nonsmooth nonconvex constrained optimization by guaranteeing feasibility and stationarity in broad settings, including when classical constraint qualifications fail.

Abstract

-stationary point at a convergence rate of

without relying on compactness or Constraint Qualification (CQ). When CQ holds, this convergence is measured by approximately satisfying the Karush-Kuhn-Tucker conditions. When CQ fails, we guarantee the attainment of weaker Fritz-John conditions. As an illustrative example, our method stably converges on piecewise quadratic SCAD regularized problems despite frequent violations of constraint qualification. The considered algorithm is similar to those of "Quadratically regularized subgradient methods for weakly convex optimization with weakly convex constraints" by Ma et al. and "Stochastic first-order methods for convex and nonconvex functional constrained optimization" by Boob et al. (whose guarantees further assume compactness and CQ), iteratively taking inexact proximal steps, computed via an inner loop applying a switching subgradient method to a strongly convex constrained subproblem. Our non-Lipschitz analysis of the switching subgradient method appears to be new and may be of independent interest.

Paper Structure (42 sections, 8 theorems, 78 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 42 sections, 8 theorems, 78 equations, 5 figures, 2 tables, 2 algorithms.

Introduction
(Inexact) Proximal Point Methods
Fritz-John/Karush-Kuhn-Tucker Stationarity
Contribution
Always Feasible Iterates
Stationarity with or without Constraint Qualification
Convergence Rates without Compactness
Related Work
Fritz-John and KKT Points in Smooth Optimization
Inexact Proximal Methods
Special Case of (Strongly) Convex Constraints
Comparison with Ma, Lin, and Yang ma2020quadratically
Comparison with Boob, Deng, and Lan boob2022stochastic
Alternative Approaches given Nonconvex Constraints
Vignette: Failure of MFCQ Assumptions for Sparse Regularized Problems
...and 27 more sections

Key Result

Lemma 2.5

When Assumptions assumption1-assumption4 hold and $\hat{\rho}>\max\{\rho,1\}$, if $\|\hat{x}_{k+1}-x_k\| \leq \frac{\epsilon}{\hat{\rho}}$ then $x_k$ is an $(\epsilon,\epsilon)$-FJ point. If additionally, Assumptions assumption5 holds, then a dual optimal $\lambda_k$ for subproblem exists and if $\|

Figures (5)

Figure 1: The SCAD function $s$ and feasible regions in 3D given by $\sum_i s(x_i)\leq p$.
Figure 2: Lagrange multipliers computed at approximate stationary points reached by iterating \ref{['conssubproblem']} on $30$ randomly generated SPR problems (see Section \ref{['numerical']} for the exact construction). As $p$ varies from $60$ to $120$, the black line shows the average approximate multipliers reached and the gray region shows the range between maximum and minimum values seen. Black dots are placed at each multiple of three, where MFCQ fails to hold.
Figure 3: Finding FJ Stationarity: $K=10^3,T=10^4,p=90$. Dotted lines show where the stopping criteria applied. $x_{lo}$ is the stationary point near the final iterate.
Figure 4: Finding Active KKT Stationarity: $K=10^3,T=10^4,p=91$. Dotted lines show where the stopping criteria applied. $x_{lo}$ is the stationary point near the final iterate.
Figure 5: Finding Inactive KKT Stationarity: $K=10^3,T=10^4,p=320$. Dotted lines show where the stopping criteria applied. $x_{lo}$ is the stationary point near the final iterate.

Theorems & Definitions (14)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 2.4
Lemma 2.5
Definition 3.1
Theorem 3.1
Lemma 3.2
Corollary 3.3
Remark 1
...and 4 more

First-Order Methods for Nonsmooth Nonconvex Functional Constrained Optimization with or without Slater Points

TL;DR

Abstract

First-Order Methods for Nonsmooth Nonconvex Functional Constrained Optimization with or without Slater Points

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (14)