Table of Contents
Fetching ...

Complexity of Zeroth- and First-order Stochastic Trust-Region Algorithms

Yunsoo Ha, Sara Shashaani, Raghu Pasupathy

TL;DR

The paper investigates how Common Random Numbers (CRN) influence the sample and iteration complexity of zeroth- and first-order stochastic trust-region algorithms (ASTRO(-DF)). By analyzing MU and CE steps under CRN across zeroth- and first-order oracles and varying sample-path regularity, the authors derive complexity landscapes: without CRN, the rate is $ ilde{O}(ε^{-6})$ across cases, while CRN can yield dramatic improvements, up to $ ilde{O}(ε^{-2})$ a.s. in the first-order, smooth-path setting, and favorable reductions to $ ilde{O}(ε^{-5})$ or $ ilde{O}(ε^{-4})$ in other structured contexts. The improvements are largely attributed to general variance-reduction mechanisms, such as finite-difference error control and sample-path smoothness, rather than algorithmic specifics. The work provides a rigorous balance-condition-based analysis, strong consistency proofs, and detailed, case-dependent complexity results, with broader implications for the design of CRN-enabled stochastic TR methods in various domains.

Abstract

Model update (MU) and candidate evaluation (CE) are classical steps incorporated inside many stochastic trust-region (TR) algorithms. The sampling effort exerted within these steps, often decided with the aim of controlling model error, largely determines a stochastic TR algorithm's sample complexity. Given that MU and CE are amenable to variance reduction, we investigate the effect of incorporating common random numbers (CRN) within MU and CE on complexity. Using ASTRO and ASTRO-DF as prototype first-order and zeroth-order families of algorithms, we demonstrate that CRN's effectiveness leads to a range of complexities depending on sample-path regularity and the oracle order. For instance, we find that in first-order oracle settings with smooth sample paths, CRN's effect is pronounced -- ASTRO with CRN achieves $\tilde{O}(ε^{-2})$ a.s. sample complexity compared to $\tilde{O}(ε^{-6})$ a.s. in the generic no-CRN setting. By contrast, CRN's effect is muted when the sample paths are not Lipschitz, with the sample complexity improving from $\tilde{O}(ε^{-6})$ a.s. to $\tilde{O}(ε^{-5})$ and $\tilde{O}(ε^{-4})$ a.s. in the zeroth- and first-order settings, respectively. Since our results imply that improvements in complexity are largely inherited from generic aspects of variance reduction, e.g., finite-differencing for zeroth-order settings and sample-path smoothness for first-order settings within MU, we anticipate similar trends in other contexts.

Complexity of Zeroth- and First-order Stochastic Trust-Region Algorithms

TL;DR

The paper investigates how Common Random Numbers (CRN) influence the sample and iteration complexity of zeroth- and first-order stochastic trust-region algorithms (ASTRO(-DF)). By analyzing MU and CE steps under CRN across zeroth- and first-order oracles and varying sample-path regularity, the authors derive complexity landscapes: without CRN, the rate is across cases, while CRN can yield dramatic improvements, up to a.s. in the first-order, smooth-path setting, and favorable reductions to or in other structured contexts. The improvements are largely attributed to general variance-reduction mechanisms, such as finite-difference error control and sample-path smoothness, rather than algorithmic specifics. The work provides a rigorous balance-condition-based analysis, strong consistency proofs, and detailed, case-dependent complexity results, with broader implications for the design of CRN-enabled stochastic TR methods in various domains.

Abstract

Model update (MU) and candidate evaluation (CE) are classical steps incorporated inside many stochastic trust-region (TR) algorithms. The sampling effort exerted within these steps, often decided with the aim of controlling model error, largely determines a stochastic TR algorithm's sample complexity. Given that MU and CE are amenable to variance reduction, we investigate the effect of incorporating common random numbers (CRN) within MU and CE on complexity. Using ASTRO and ASTRO-DF as prototype first-order and zeroth-order families of algorithms, we demonstrate that CRN's effectiveness leads to a range of complexities depending on sample-path regularity and the oracle order. For instance, we find that in first-order oracle settings with smooth sample paths, CRN's effect is pronounced -- ASTRO with CRN achieves a.s. sample complexity compared to a.s. in the generic no-CRN setting. By contrast, CRN's effect is muted when the sample paths are not Lipschitz, with the sample complexity improving from a.s. to and a.s. in the zeroth- and first-order settings, respectively. Since our results imply that improvements in complexity are largely inherited from generic aspects of variance reduction, e.g., finite-differencing for zeroth-order settings and sample-path smoothness for first-order settings within MU, we anticipate similar trends in other contexts.
Paper Structure (31 sections, 13 theorems, 76 equations, 1 figure, 1 table, 2 algorithms)

This paper contains 31 sections, 13 theorems, 76 equations, 1 figure, 1 table, 2 algorithms.

Key Result

Lemma 2.5

Suppose $(\xi_i,\mathcal{F}_i)_{i =0,1,\ldots}$ is a martingale difference sequence on some probability space $({\color{black}{\Xi}\color{black}},\mathcal{F},P)$, with $\xi_0=0$ and $\{{\color{black}{\Xi}\color{black}},\emptyset\} = \mathcal{F}_0 \subseteq \mathcal{F}_1 \subseteq \mathcal{F}_2 \subs

Figures (1)

  • Figure 1: An example problem adapted from2022raghunpastaa, to estimate the expected waiting time $\mathbb{E}[F(x,\xi)]$ of bus passengers arriving according to a Poisson process $(\xi)$ as a function of bus schedule $x$ in a fixed time interval $[0,30]$. Notice that the estimated wait time $\bar{F}(x,n)$ with CRN better retains the expected function's smoothness property.

Theorems & Definitions (39)

  • Remark 1: Applicability of CRN
  • Definition 2.1: Slowly Varying Sequence and Function
  • Definition 2.2: Orders $\mathcal{O}$ and $\widetilde{\mathcal{O}}$
  • Definition 2.3: Consistency
  • Definition 2.4: Iteration Complexity and Sample Complexity
  • Lemma 2.5: Bernstein's Inequality for Martingales
  • proof
  • Theorem 2.6: Variance of Function Difference
  • proof
  • Lemma 2.7: Borel-Cantelli's First Lemma
  • ...and 29 more