Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality

Joshua Cutler; Mateo Díaz; Dmitriy Drusvyatskiy

Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality

Joshua Cutler, Mateo Díaz, Dmitriy Drusvyatskiy

TL;DR

The paper develops a rigorous stochastic approximation framework for decision-dependent problems modeled by variational inequalities, where the data distribution responds to the current decision. It proves that the averaged stochastic forward-backward iterates are asymptotically normal, with a covariance that cleanly splits gradient-noise effects from distributional drift, and identifies the Jacobian-driven matrix $ abla R(x^igstar)$ that governs the limit. Moreover, leveraging Hájek–Le Cam theory, the authors establish a local minimax lower bound and show that the averaged SFB estimator attains this bound, proving local asymptotic optimality for both single-agent and multiplayer performative settings. The results extend to minibatch variants and provide a principled understanding of how distributional shifts affect asymptotic uncertainty, informing both theory and practice in performative prediction. Overall, the work clarifies the fundamental limits and efficiency of learning procedures under decision-dependent environments.

Abstract

We analyze a stochastic approximation algorithm for decision-dependent problems, wherein the data distribution used by the algorithm evolves along the iterate sequence. The primary examples of such problems appear in performative prediction and its multiplayer extensions. We show that under mild assumptions, the deviation between the average iterate of the algorithm and the solution is asymptotically normal, with a covariance that clearly decouples the effects of the gradient noise and the distributional shift. Moreover, building on the work of Hájek and Le Cam, we show that the asymptotic performance of the algorithm with averaging is locally minimax optimal.

Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality

TL;DR

that governs the limit. Moreover, leveraging Hájek–Le Cam theory, the authors establish a local minimax lower bound and show that the averaged SFB estimator attains this bound, proving local asymptotic optimality for both single-agent and multiplayer performative settings. The results extend to minibatch variants and provide a principled understanding of how distributional shifts affect asymptotic uncertainty, informing both theory and practice in performative prediction. Overall, the work clarifies the fundamental limits and efficiency of learning procedures under decision-dependent environments.

Abstract

Paper Structure (33 sections, 28 theorems, 216 equations, 1 figure, 1 algorithm)

This paper contains 33 sections, 28 theorems, 216 equations, 1 figure, 1 algorithm.

Introduction
Summary of Main Results
Related Work
Learning with decision-dependent distributions.
Stochastic approximation.
Local minimax lower bounds in estimation.
Outline
Notation and Definitions
Strong monotonicity and smoothness.
Probability measures.
Notions of convergence.
Background on Learning with Decision-Dependent Distributions
Convergence and Asymptotic Normality
Proof of Theorem \ref{['thm:anperf']}
Asymptotic Optimality
...and 18 more sections

Key Result

Theorem 1.1

Suppose that $G(\cdot,z)$ is $\alpha$-strongly monotone and Lipschitz continuous on $\mathcal{X}$, $G(x,\cdot)$ is $\beta$-Lipschitz continuous on $\mathcal{Z}$, and the distribution map $\mathcal{D}(\cdot)$ is $\gamma$-Lipschitz continuous on $\mathcal{X}$ with respect to the Wasserstein-1 distance where

Figures (1)

Figure 1: Consider the problem corresponding to $G(x, z) = \nabla_x \ell (x, z)$ with $\ell(x, z) = \frac{1}{2} \|x-z\|^{2}$ and $\mathcal{D}(x_{1}, x_{2}) = \mathsf{N}(\rho (x_{2}, x_{1}), I_2)$. A simple computation shows $\Sigma = I_{2}$ and $W = [1,-\rho; -\rho ,1].$ As $\rho$ approaches one, $W^{-1}$ becomes ill conditioned. We run algorithm \ref{['eqn:VI_iteration']}$400$ times using $\eta_{t} = t^{-3/4}$ for $10^6$ iterations. The first row depicts the resulting average iterates laid over the confidence regions (plotted in logarithmic scale) corresponding to the asymptotic normal distribution. The next two rows depict kernel density estimates from the asymptotic normal distribution (top) and the deviation $\sqrt{k} (\bar{x}_{k} - x^\star)$ (bottom).

Theorems & Definitions (52)

Theorem 1.1: Asymptotic normality, informal; see Theorem \ref{['thm:anperf']}
Theorem 1.2: Asymptotic optimality, informal; see Theorem \ref{['thm:optimality']}
Lemma 3.1: Deviation
Definition 3.2: Equilibrium point
Theorem 3.3: Existence
Proposition 4.1: Almost sure convergence
Theorem 4.2: Asymptotic normality
Example 4.3: Performative prediction with location-scale families
Example 4.4: Multiplayer performative prediction with location-scale families
Lemma 4.5: Lipschitz continuity and strong monotonicity
...and 42 more

Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality

TL;DR

Abstract

Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (52)