Table of Contents
Fetching ...

Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality

Joshua Cutler, Mateo Díaz, Dmitriy Drusvyatskiy

TL;DR

The paper develops a rigorous stochastic approximation framework for decision-dependent problems modeled by variational inequalities, where the data distribution responds to the current decision. It proves that the averaged stochastic forward-backward iterates are asymptotically normal, with a covariance that cleanly splits gradient-noise effects from distributional drift, and identifies the Jacobian-driven matrix $ abla R(x^igstar)$ that governs the limit. Moreover, leveraging Hájek–Le Cam theory, the authors establish a local minimax lower bound and show that the averaged SFB estimator attains this bound, proving local asymptotic optimality for both single-agent and multiplayer performative settings. The results extend to minibatch variants and provide a principled understanding of how distributional shifts affect asymptotic uncertainty, informing both theory and practice in performative prediction. Overall, the work clarifies the fundamental limits and efficiency of learning procedures under decision-dependent environments.

Abstract

We analyze a stochastic approximation algorithm for decision-dependent problems, wherein the data distribution used by the algorithm evolves along the iterate sequence. The primary examples of such problems appear in performative prediction and its multiplayer extensions. We show that under mild assumptions, the deviation between the average iterate of the algorithm and the solution is asymptotically normal, with a covariance that clearly decouples the effects of the gradient noise and the distributional shift. Moreover, building on the work of Hájek and Le Cam, we show that the asymptotic performance of the algorithm with averaging is locally minimax optimal.

Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality

TL;DR

The paper develops a rigorous stochastic approximation framework for decision-dependent problems modeled by variational inequalities, where the data distribution responds to the current decision. It proves that the averaged stochastic forward-backward iterates are asymptotically normal, with a covariance that cleanly splits gradient-noise effects from distributional drift, and identifies the Jacobian-driven matrix that governs the limit. Moreover, leveraging Hájek–Le Cam theory, the authors establish a local minimax lower bound and show that the averaged SFB estimator attains this bound, proving local asymptotic optimality for both single-agent and multiplayer performative settings. The results extend to minibatch variants and provide a principled understanding of how distributional shifts affect asymptotic uncertainty, informing both theory and practice in performative prediction. Overall, the work clarifies the fundamental limits and efficiency of learning procedures under decision-dependent environments.

Abstract

We analyze a stochastic approximation algorithm for decision-dependent problems, wherein the data distribution used by the algorithm evolves along the iterate sequence. The primary examples of such problems appear in performative prediction and its multiplayer extensions. We show that under mild assumptions, the deviation between the average iterate of the algorithm and the solution is asymptotically normal, with a covariance that clearly decouples the effects of the gradient noise and the distributional shift. Moreover, building on the work of Hájek and Le Cam, we show that the asymptotic performance of the algorithm with averaging is locally minimax optimal.
Paper Structure (33 sections, 28 theorems, 216 equations, 1 figure, 1 algorithm)

This paper contains 33 sections, 28 theorems, 216 equations, 1 figure, 1 algorithm.

Key Result

Theorem 1.1

Suppose that $G(\cdot,z)$ is $\alpha$-strongly monotone and Lipschitz continuous on $\mathcal{X}$, $G(x,\cdot)$ is $\beta$-Lipschitz continuous on $\mathcal{Z}$, and the distribution map $\mathcal{D}(\cdot)$ is $\gamma$-Lipschitz continuous on $\mathcal{X}$ with respect to the Wasserstein-1 distance where

Figures (1)

  • Figure 1: Consider the problem corresponding to $G(x, z) = \nabla_x \ell (x, z)$ with $\ell(x, z) = \frac{1}{2} \|x-z\|^{2}$ and $\mathcal{D}(x_{1}, x_{2}) = \mathsf{N}(\rho (x_{2}, x_{1}), I_2)$. A simple computation shows $\Sigma = I_{2}$ and $W = [1,-\rho; -\rho ,1].$ As $\rho$ approaches one, $W^{-1}$ becomes ill conditioned. We run algorithm \ref{['eqn:VI_iteration']}$400$ times using $\eta_{t} = t^{-3/4}$ for $10^6$ iterations. The first row depicts the resulting average iterates laid over the confidence regions (plotted in logarithmic scale) corresponding to the asymptotic normal distribution. The next two rows depict kernel density estimates from the asymptotic normal distribution (top) and the deviation $\sqrt{k} (\bar{x}_{k} - x^\star)$ (bottom).

Theorems & Definitions (52)

  • Theorem 1.1: Asymptotic normality, informal; see Theorem \ref{['thm:anperf']}
  • Theorem 1.2: Asymptotic optimality, informal; see Theorem \ref{['thm:optimality']}
  • Lemma 3.1: Deviation
  • Definition 3.2: Equilibrium point
  • Theorem 3.3: Existence
  • Proposition 4.1: Almost sure convergence
  • Theorem 4.2: Asymptotic normality
  • Example 4.3: Performative prediction with location-scale families
  • Example 4.4: Multiplayer performative prediction with location-scale families
  • Lemma 4.5: Lipschitz continuity and strong monotonicity
  • ...and 42 more