On the Convergence and Complexity of the Stochastic Central Finite-Difference Based Gradient Estimation Methods

Raghu Bollapragada; Cem Karamanli

On the Convergence and Complexity of the Stochastic Central Finite-Difference Based Gradient Estimation Methods

Raghu Bollapragada, Cem Karamanli

TL;DR

This work addresses unconstrained stochastic optimization when gradients are inaccessible by proposing a central finite-difference gradient estimation framework with adaptive sampling under common random numbers. It provides a unified analysis for nonconvex objectives, showing sublinear convergence to a neighborhood and optimal worst-case iteration and sample complexities $O\left(\epsilon^{-1}\right)$ and $O\left(\epsilon^{-2}\right)$, respectively. The study compares multiple central finite-difference variants (cFD, cGS, cSS, cRC, cRS), detailing their convergence behavior and dimension-related trade-offs, supported by numerical experiments on nonlinear least squares. The results have practical implications for scalable, derivative-free stochastic optimization, with potential extensions to quasi-Newton or accelerated methods.

Abstract

This paper presents an algorithmic framework for solving unconstrained stochastic optimization problems using only stochastic function evaluations. We employ central finite-difference based gradient estimation methods to approximate the gradients and dynamically control the accuracy of these approximations by adjusting the sample sizes used in stochastic realizations. We analyze the theoretical properties of the proposed framework on nonconvex functions. Our analysis yields sublinear convergence results to the neighborhood of the solution, and establishes the optimal worst-case iteration complexity ($\mathcal{O}(ε^{-1})$) and sample complexity ($\mathcal{O}(ε^{-2})$) for each gradient estimation method to achieve an $ε$-accurate solution. Finally, we demonstrate the performance of the proposed framework and the quality of the gradient estimation methods through numerical experiments on nonlinear least squares problems.

On the Convergence and Complexity of the Stochastic Central Finite-Difference Based Gradient Estimation Methods

TL;DR

and

, respectively. The study compares multiple central finite-difference variants (cFD, cGS, cSS, cRC, cRS), detailing their convergence behavior and dimension-related trade-offs, supported by numerical experiments on nonlinear least squares. The results have practical implications for scalable, derivative-free stochastic optimization, with potential extensions to quasi-Newton or accelerated methods.

Abstract

) and sample complexity (

) for each gradient estimation method to achieve an

-accurate solution. Finally, we demonstrate the performance of the proposed framework and the quality of the gradient estimation methods through numerical experiments on nonlinear least squares problems.

Paper Structure (30 sections, 4 theorems, 37 equations, 33 figures, 2 tables)

This paper contains 30 sections, 4 theorems, 37 equations, 33 figures, 2 tables.

Introduction
Literature Review
Notation
Preliminaries
Theoretical Results
Convergence Results
Complexity Results
Numerical Experiments
Final Remarks
Appendix
Bounded Variance in \ref{['eq:theoreticalnormcond']}
Additional Plots
Nonlinear Least Squares Problems
$\quad ~$ Chebyquad Function $(d = 30, p = 45)$ with Relative Error, $\sigma = 10^{-3}$
Chebyquad Function $(d = 30, p = 45)$ with Absolute Error, $\sigma = 10^{-3}$
...and 15 more sections

Key Result

Lemma 1

For any $x_0 \in \mathbb{R}^d$, let $\{x_k: k\in \mathbb{Z}_{++}\}$ be generated by iteration eq:iter with $|S_{k}|$ satisfying Condition cond:theoreticalnormcond for a given constant $\theta > 0$. Suppose that Assumptions assum:Lipschitzstochf, assum:boundedvarinstochgrad, and assum:sampling hold. we have where $\bar{\alpha}_k, \chi_k > 0$ are given in Table tbl:unified.

Figures (33)

Figure 1: Performance of different gradient estimation methods using the tuned hyperparameters on the Bdqrtic function with $\sigma = 10^{-3}$. Top row: absolute error, bottom row: relative error.
Figure 2: Performance of different gradient estimation methods using the tuned hyperparameters on the Cube function with $\sigma = 10^{-3}$. Top row: absolute error, bottom row: relative error.
Figure 3: The effect of number of directions $N$ on the performance of different randomized gradient estimation methods on the Bdqrtic function with $\sigma = 10^{-3}$. Sampling radius $\nu$ and step size $\alpha$ are tuned for each method and $N$ combination to achieve the best performance. Top row: absolute error, bottom row: relative error.
Figure 4: Performance of different gradient estimation methods using the tuned hyperparameters on the Chebyquad function with relative error and $\sigma = 10^{-3}$.
Figure 5: The effect of number of directions $N$ on the performance of different randomized gradient estimation methods on the Chebyquad function with relative error and $\sigma = 10^{-3}$. The sampling radius $\nu$ and step size $\alpha$ are tuned for each method and $N$ combination to achieve the best performance.
...and 28 more figures

Theorems & Definitions (9)

Lemma 1
proof
Theorem 1
proof
Definition 1
Lemma 2
proof
Theorem 2
proof

On the Convergence and Complexity of the Stochastic Central Finite-Difference Based Gradient Estimation Methods

TL;DR

Abstract

On the Convergence and Complexity of the Stochastic Central Finite-Difference Based Gradient Estimation Methods

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (33)

Theorems & Definitions (9)