Table of Contents
Fetching ...

High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise

Yuchen Fang, Javad Lavaei, Sen Na

TL;DR

This work tackles constrained stochastic optimization with stochastic objective values and deterministic equality constraints. It introduces a trust-region stochastic sequential quadratic programming (TR-SSQP) method that uses probabilistic zeroth-, first-, and second-order oracles to estimate function values, gradients, and Hessians under irreducible, heavy-tailed noise. The authors prove high-probability iteration bounds: $\mathcal{O}(\varepsilon^{-2})$ iterations to obtain a first-order $\varepsilon$-stationary point and $\mathcal{O}(\varepsilon^{-3})$ iterations for a second-order $\varepsilon$-stationary point, with refinements for heavy-tailed and sub-exponential zeroth-order noise. They also analyze sample complexities and demonstrate finite-time almost-sure convergence when the noise moment parameter $\delta>1$. Numerical experiments on the CUTEst set corroborate the theory and highlight the benefits of Hessian-aware variants in practice.

Abstract

In this paper, we consider nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Stochastic Sequential Quadratic Programming (TR-SSQP) method and establish its high-probability iteration complexity bounds for identifying first- and second-order $ε$-stationary points. In our algorithm, we assume that exact objective values, gradients, and Hessians are not directly accessible but can be estimated via zeroth-, first-, and second-order probabilistic oracles. Compared to existing complexity studies of SSQP methods that rely on a zeroth-order oracle with sub-exponential tail noise (i.e., light-tailed) and focus mostly on first-order stationarity, our analysis accommodates irreducible and heavy-tailed noise in the zeroth-order oracle and significantly extends the analysis to second-order stationarity. We show that under heavy-tailed noise conditions, our SSQP method achieves the same high-probability first-order iteration complexity bounds as in the light-tailed noise setting, while further exhibiting promising second-order iteration complexity bounds. Specifically, the method identifies a first-order $ε$-stationary point in $\mathcal{O}(ε^{-2})$ iterations and a second-order $ε$-stationary point in $\mathcal{O}(ε^{-3})$ iterations with high probability, provided that $ε$ is lower bounded by a constant determined by the irreducible noise level in estimation. We validate our theoretical findings and evaluate the practical performance of our method on CUTEst benchmark test set.

High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise

TL;DR

This work tackles constrained stochastic optimization with stochastic objective values and deterministic equality constraints. It introduces a trust-region stochastic sequential quadratic programming (TR-SSQP) method that uses probabilistic zeroth-, first-, and second-order oracles to estimate function values, gradients, and Hessians under irreducible, heavy-tailed noise. The authors prove high-probability iteration bounds: iterations to obtain a first-order -stationary point and iterations for a second-order -stationary point, with refinements for heavy-tailed and sub-exponential zeroth-order noise. They also analyze sample complexities and demonstrate finite-time almost-sure convergence when the noise moment parameter . Numerical experiments on the CUTEst set corroborate the theory and highlight the benefits of Hessian-aware variants in practice.

Abstract

In this paper, we consider nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Stochastic Sequential Quadratic Programming (TR-SSQP) method and establish its high-probability iteration complexity bounds for identifying first- and second-order -stationary points. In our algorithm, we assume that exact objective values, gradients, and Hessians are not directly accessible but can be estimated via zeroth-, first-, and second-order probabilistic oracles. Compared to existing complexity studies of SSQP methods that rely on a zeroth-order oracle with sub-exponential tail noise (i.e., light-tailed) and focus mostly on first-order stationarity, our analysis accommodates irreducible and heavy-tailed noise in the zeroth-order oracle and significantly extends the analysis to second-order stationarity. We show that under heavy-tailed noise conditions, our SSQP method achieves the same high-probability first-order iteration complexity bounds as in the light-tailed noise setting, while further exhibiting promising second-order iteration complexity bounds. Specifically, the method identifies a first-order -stationary point in iterations and a second-order -stationary point in iterations with high probability, provided that is lower bounded by a constant determined by the irreducible noise level in estimation. We validate our theoretical findings and evaluate the practical performance of our method on CUTEst benchmark test set.

Paper Structure

This paper contains 38 sections, 19 theorems, 163 equations, 5 figures, 1 algorithm.

Key Result

Lemma 4.2

Under Assumption assump:4-1 with $\alpha=1$, there exists a positive constant $\kappa_B\geq 1$ such that $\|{\bar{H}}_k\|\leq \kappa_B$ on the event ${\mathcal{A}}_k\cap{\mathcal{B}}_k$.

Figures (5)

  • Figure 1: Averaged stopping time $T_{\epsilon}$ with noise from four different distributions. In every plot, the first four boxes correspond to TR-SSQP with different choices of ${\bar{H}}_k$. The fifth box corresponds to TR-SSQP2, and the last box corresponds to LS-SSQP.
  • Figure 2: Performance profiles with noise following a normal distribution. Each line represents a different method. The first column corresponds to the default irreducible noise levels with varying $\epsilon$. Each row of the last two columns corresponds to varying $\epsilon_f$, $\epsilon_g$ and $\epsilon_h$, respectively.
  • Figure 3: Performance profiles with noise following a $t$-distribution. Each line represents a different method. The first column corresponds to the default irreducible noise levels with varying $\epsilon$. Each row of the last two columns corresponds to varying $\epsilon_f$, $\epsilon_g$ and $\epsilon_h$, respectively.
  • Figure 4: Performance profiles with noise following a log-normal-distribution. Each line represents a different method. The first column corresponds to the default irreducible noise levels with varying $\epsilon$. Each row of the last two columns corresponds to varying $\epsilon_f$, $\epsilon_g$ and $\epsilon_h$, respectively.
  • Figure 5: Performance profiles with noise following a Weibull distribution. Each line represents a different method. The first column corresponds to the default irreducible noise levels with varying $\epsilon$. Each row of the last two columns corresponds to varying $\epsilon_f$, $\epsilon_g$ and $\epsilon_h$, respectively.

Theorems & Definitions (28)

  • Definition 3.1: Probabilistic second-order oracle
  • Definition 3.2: Probabilistic first-order oracle
  • Definition 3.3: Probabilistic zeroth-order oracle
  • Remark 3.4
  • Lemma 4.2
  • Lemma 4.3
  • Lemma 4.4
  • Lemma 4.5
  • Definition 4.6: Stopping time
  • Lemma 4.8
  • ...and 18 more