Bregman Linearized Augmented Lagrangian Method for Nonconvex Constrained Stochastic Zeroth-order Optimization
Qiankun Shi, Xiao Wang, Hao Wang
TL;DR
This work addresses nonconvex constrained stochastic zeroth-order optimization with exact constraints and noisy objective evaluations. It introduces a single-loop Bregman linearized augmented Lagrangian method that uses a two-point zeroth-order gradient estimator with variance reduction, and analyzes the oracle complexity to achieve an $\varepsilon$-KKT point. Key findings show dimension-dependent improvements: the complexity scales as $O(p d^{2/p} \varepsilon^{-3})$ for $p\in[2,2\ln d]$ and as $O(\ln d \varepsilon^{-3})$ for $p>2\ln d$ under Rademacher smoothing, matching the best known $\varepsilon$-rates while reducing the $d$-dependence. Numerical experiments on constrained Lasso and black-box adversarial attacks validate the approach and demonstrate practical efficiency gains over existing zeroth-order methods.
Abstract
In this paper, we study nonconvex constrained stochastic zeroth-order optimization problems, for which we have access to exact information of constraints and noisy function values of the objective. We propose a Bregman linearized augmented Lagrangian method that utilizes stochastic zeroth-order gradient estimators combined with a variance reduction technique. We analyze its oracle complexity, in terms of the total number of stochastic function value evaluations required to achieve an \(ε\)-KKT point in \(\ell_p\)-norm metrics with \(p \ge 2\), where \(p\) is a parameter associated with the selected Bregman distance. In particular, starting from a near-feasible initial point and using Rademacher smoothing, the oracle complexity is in order \(O(p d^{2/p} ε^{-3})\) for \(p \in [2, 2 \ln d]\), and \(O(\ln d \cdot ε^{-3})\) for \(p > 2 \ln d\), where \(d\) denotes the problem dimension. Those results show that the complexity of the proposed method can achieve a dimensional dependency lower than \(O(d)\) without requiring additional assumptions, provided that a Bregman distance is chosen properly. This offers a significant improvement in the high-dimensional setting over existing work, and matches the lowest complexity order with respect to the tolerance \(ε\) reported in the literature. Numerical experiments on constrained Lasso and black-box adversarial attack problems highlight the promising performances of the proposed method.
