Table of Contents
Fetching ...

Stochastic gradient with least-squares control variates

Fabio Nobile, Matteo Raviola, Nathan Schaeffer

TL;DR

This work tackles stochastic optimization where the objective is an expectation J(u) = \mathbb{E}_{Y\sim\rho}[g(u,Y)], a setting where gradient evaluations are expensive and ρ is known. It introduces SG-LSCV, a memory-based control-variate method that fits a linear gradient surrogate via optimal weighted least-squares using past samples, then uses this surrogate to construct a variance-reduced gradient update that preserves the per-iteration cost of SGD. The authors prove convergence guarantees for both fixed and growing approximation spaces, showing exponential or algebraic decay depending on the gradient-projection error and step-size scheduling, and they demonstrate the approach on PDE-constrained optimization problems with uncertainties. The results indicate substantial improvements over SGD and finite-sum VR methods like SAGA, especially in high-dimensional or continuous-parameter settings, by exploiting gradient regularity through structured polynomial approximations and optimal sampling. The methods offer a scalable framework for variance reduction in continuous stochastic optimization and have potential applications in ML contexts where the data-generating distribution is known and gradient smoothness can be exploited.

Abstract

The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by leveraging stored gradient information; however, they are restricted to settings where the objective functional is a finite sum, and their performance degrades when the number of terms in the sum is large. In this work, we propose a novel approach which is well suited when the objective is given by an expectation over random variables with a continuous probability distribution. Our method constructs a control variate by fitting a linear model to past gradient evaluations using weighted discrete least-squares, effectively reducing variance while preserving computational efficiency. We establish theoretical sublinear convergence guarantees for strongly convex objectives and demonstrate the method's effectiveness through numerical experiments on random PDE-constrained optimization problems.

Stochastic gradient with least-squares control variates

TL;DR

This work tackles stochastic optimization where the objective is an expectation J(u) = \mathbb{E}_{Y\sim\rho}[g(u,Y)], a setting where gradient evaluations are expensive and ρ is known. It introduces SG-LSCV, a memory-based control-variate method that fits a linear gradient surrogate via optimal weighted least-squares using past samples, then uses this surrogate to construct a variance-reduced gradient update that preserves the per-iteration cost of SGD. The authors prove convergence guarantees for both fixed and growing approximation spaces, showing exponential or algebraic decay depending on the gradient-projection error and step-size scheduling, and they demonstrate the approach on PDE-constrained optimization problems with uncertainties. The results indicate substantial improvements over SGD and finite-sum VR methods like SAGA, especially in high-dimensional or continuous-parameter settings, by exploiting gradient regularity through structured polynomial approximations and optimal sampling. The methods offer a scalable framework for variance reduction in continuous stochastic optimization and have potential applications in ML contexts where the data-generating distribution is known and gradient smoothness can be exploited.

Abstract

The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by leveraging stored gradient information; however, they are restricted to settings where the objective functional is a finite sum, and their performance degrades when the number of terms in the sum is large. In this work, we propose a novel approach which is well suited when the objective is given by an expectation over random variables with a continuous probability distribution. Our method constructs a control variate by fitting a linear model to past gradient evaluations using weighted discrete least-squares, effectively reducing variance while preserving computational efficiency. We establish theoretical sublinear convergence guarantees for strongly convex objectives and demonstrate the method's effectiveness through numerical experiments on random PDE-constrained optimization problems.

Paper Structure

This paper contains 27 sections, 217 equations, 5 figures, 4 algorithms.

Figures (5)

  • Figure 1: Convergence behavior of the SG-LSCV algorithm with fixed polynomial approximation spaces of dimensions $m \in \{6,16,21\}$, for different choices of the step size.
  • Figure 2: Comparison of SGD, SAGA and two versions of variable SG-LSCV
  • Figure 3: Estimate of the error of the polynomial approximation of the gradient $\left\lVert(I-\Pi^{{\mathbb{V}}_m}) [\nabla g(u,\cdot)]\right\rVert_{L^2_\rho(\Gamma;L^2(D))}$ for $u$ the optimal control and $u(x)=\sin(x_1)\sin(x_2)$.
  • Figure 4: Convergence of the SG-LSCV algorithm with fixed approximation spaces ${\mathbb{V}}_m$ with $m\in\{2,5,9\}$ on the $5$-dimensional problem. The plot shows the exponential moving average of the error $\left\lVert u_k-u\right\rVert_{L^2(D)}$.
  • Figure 5: Comparison of stochastic optimization methods in 5 dimensions. The plot shows the exponential moving average of the error $\left\lVert u_k-u\right\rVert_{L^2(D)}$ over $10^5$ iterations for SAGA with $5$ and $8$ quadrature points, and SG-LSCV with variable approximation spaces. The full gradient method was run with step size $\tau=100$ and is included as a reference baseline.

Theorems & Definitions (15)

  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • ...and 5 more