Table of Contents
Fetching ...

Stochastic Optimization with Optimal Importance Sampling

Liviu Aolaritei, Bart P. G. Van Parys, Henry Lam, Michael I. Jordan

TL;DR

The paper addresses efficient stochastic optimization when the sampling distribution used for gradient estimation must be calibrated together with the decision variables, creating a circular dependency. It proposes a single-loop stochastic approximation method that jointly updates the decision variable and IS parameters via a joint Nesterov's dual averaging scheme, without time-scale separation or nested loops, and proves global convergence plus asymptotic variance optimality within the IS class. Theoretical results show almost sure convergence, finite-time active constraint identification, and CLTs for the coupled iterates, with the averaged decision iterates achieving minimal asymptotic variance as if an oracle IS were available. A numerical example on rare-event quantile estimation confirms substantial variance reductions and practical effectiveness of the approach.

Abstract

Importance Sampling (IS) is a widely used variance reduction technique for enhancing the efficiency of Monte Carlo methods, particularly in rare-event simulation and related applications. Despite its effectiveness, the performance of IS is highly sensitive to the choice of the proposal distribution and often requires stochastic calibration. While the design and analysis of IS have been extensively studied in estimation settings, applying IS within stochastic optimization introduces a fundamental challenge: the decision variable and the importance sampling distribution are mutually dependent, creating a circular optimization structure. This interdependence complicates both convergence analysis and variance control. We consider convex stochastic optimization problems with linear constraints and propose a single-loop stochastic approximation algorithm, based on a joint variant of Nesterov's dual averaging, that jointly updates the decision variable and the importance sampling distribution, without time-scale separation or nested optimization. The method is globally convergent and achieves minimal asymptotic variance among stochastic gradient schemes, matching the performance of an oracle sampler adapted to the optimal solution.

Stochastic Optimization with Optimal Importance Sampling

TL;DR

The paper addresses efficient stochastic optimization when the sampling distribution used for gradient estimation must be calibrated together with the decision variables, creating a circular dependency. It proposes a single-loop stochastic approximation method that jointly updates the decision variable and IS parameters via a joint Nesterov's dual averaging scheme, without time-scale separation or nested loops, and proves global convergence plus asymptotic variance optimality within the IS class. Theoretical results show almost sure convergence, finite-time active constraint identification, and CLTs for the coupled iterates, with the averaged decision iterates achieving minimal asymptotic variance as if an oracle IS were available. A numerical example on rare-event quantile estimation confirms substantial variance reductions and practical effectiveness of the approach.

Abstract

Importance Sampling (IS) is a widely used variance reduction technique for enhancing the efficiency of Monte Carlo methods, particularly in rare-event simulation and related applications. Despite its effectiveness, the performance of IS is highly sensitive to the choice of the proposal distribution and often requires stochastic calibration. While the design and analysis of IS have been extensively studied in estimation settings, applying IS within stochastic optimization introduces a fundamental challenge: the decision variable and the importance sampling distribution are mutually dependent, creating a circular optimization structure. This interdependence complicates both convergence analysis and variance control. We consider convex stochastic optimization problems with linear constraints and propose a single-loop stochastic approximation algorithm, based on a joint variant of Nesterov's dual averaging, that jointly updates the decision variable and the importance sampling distribution, without time-scale separation or nested optimization. The method is globally convergent and achieves minimal asymptotic variance among stochastic gradient schemes, matching the performance of an oracle sampler adapted to the optimal solution.

Paper Structure

This paper contains 23 sections, 20 theorems, 153 equations, 6 figures.

Key Result

Lemma 2.4

Let Assumptions assump:SOassump:SO:twice:differentiable-assump:SO:unique be satisfied. Then, any iterate sequence $\bar{\theta}_n$ enjoying the CLT eq:LB:CLT must also satisfy the CLT

Figures (6)

  • Figure 1: Two constrained stochastic optimization problem. The minimum $\theta^\star$ in \ref{['eq:so']} is characterized as the minimum restricted to the active constraint set $\left\{ \theta : A^\star_a\theta-b^\star_a=0 \right\}$. Likewise, the minimum $\mu^\star$ in \ref{['eq:optimal:IS:2']} is characterized as the minimum restricted to the active constraint set $\left\{ \mu : C^\star_a\mu-d^\star_a=0 \right\}$.
  • Figure 2: Evolution of the decision variable. The dashed red line indicates $\theta^\star$. NDA with IS exhibits rapid variance contraction once the IS parameter adapts, whereas Projected SGD and vanilla NDA do not exhibit comparable concentration.
  • Figure 3: Raw iterates $\mu_n$
  • Figure 4: Averaged iterates $\bar{\mu}_n$
  • Figure 6: Without burn-in
  • ...and 1 more figures

Theorems & Definitions (39)

  • proof
  • Remark 2.3: Projected SA
  • Lemma 2.4: Projected gradient CLT
  • proof
  • Remark 2.5: Sample average approximation
  • Lemma 3.2: Exponential tilting
  • Example 3.3: Normal Quantile Estimation
  • Lemma 3.4: Mean translation
  • Example 3.5: Exponential Quantile Estimation
  • Lemma 3.6: Mixture models
  • ...and 29 more