Wasserstein Distributionally Robust Performative Prediction

Siyi Wang; Zifan Wang; Karl H. Johansson

Wasserstein Distributionally Robust Performative Prediction

Siyi Wang, Zifan Wang, Karl H. Johansson

TL;DR

This work introduces a Wasserstein distributionally robust optimization framework for performative prediction with a model-dependent ambiguity radius. By applying Lagrangian relaxation and strong duality, the intractable DRO objective is reformulated into a tractable min–max surrogate, enabling two iterative algorithms, DR-RRM and DR-RGD, to converge to a robust performative stable point under standard regularity conditions. Theoretical results establish convergence to a unique stable point in the exact setting and convergence to a neighborhood when inner maximizations are inexact, with explicit suboptimality bounds relative to the global performative optimum. Numerical experiments on dynamic credit scoring demonstrate improved resilience to strategic distribution shifts and sustained robust performance during retraining.

Abstract

Performativity means that the deployment of a predictive model incentivizes agents to strategically adapt their behavior, thereby inducing a model-dependent distribution shift. Practitioners often repeatedly retrain the model on data samples to adapt to evolving distributions. In this paper, we develop a Wasserstein distributionally robust optimization framework for performative prediction, where the prediction model is optimized over the worst-case distribution within a Wasserstein ambiguity set. We allow the ambiguity radius to depend on the prediction model, which subsumes the constant-radius formulation as a special case. By leveraging strong duality, the intractable robust objective is reformulated as a computationally tractable minimization problem. Based on this formulation, we develop distributionally robust repeated risk minimization (DR-RRM) and repeated gradient descent (DR-RGD), to iteratively find an equilibrium between distributional shifts and model retraining. Theoretical analyses demonstrate that, under standard regularity conditions, both algorithms converge to a unique robust performative stable point. Our analysis explicitly accounts for inner-loop approximation errors and shows convergence to a neighborhood of the stable point in inexact settings. Additionally, we establish theoretical bounds on the suboptimality gap between the stable point and the global performative optimum. Finally, numerical simulations of a dynamic credit scoring problem demonstrate the efficacy of the method.

Wasserstein Distributionally Robust Performative Prediction

TL;DR

Abstract

Paper Structure (10 sections, 8 theorems, 62 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 10 sections, 8 theorems, 62 equations, 6 figures, 1 table, 2 algorithms.

Introduction
Problem Formulation
Performative prediction
Distributionally robust optimization
Main results
Performative risk minimization
Performative gradient descent
Suboptimality guarantee
Simulations
Conclusion

Key Result

Lemma 1

sinha2017certifying Let functions $l(\theta,\xi):\Theta \times \Xi \rightarrow \mathbb{R}$ and $c(\xi,\zeta):\Xi \times \Xi \rightarrow \mathbb{R}$ be continuous. For any distribution $\hat{\mathbb{P}}$ and $\lambda(\theta)\ge 0$, we have that with $f(\theta,\xi) = \sup_{\zeta} l(\theta,\zeta) - \lambda(\theta) c(\xi,\zeta)$.

Figures (6)

Figure 1: Distributionally robust (DR) performative prediction diagram.
Figure 2: Convergence of the distributionally robust performative risk minimization method (Algorithm \ref{['alg:risk minimization']}) for varying $\varepsilon$-sensitivity parameters. We add a marker if at the next iteration the distance between iterates is numerically zero.
Figure 3: Performance evolution of Algorithm \ref{['alg:risk minimization']} under strategic sensitivity $\varepsilon = 100$. Solid blue lines indicate the optimization phase, and dotted green lines indicate the distribution shift after classifier deployment.
Figure 4: Convergence of the distributionally robust performative gradient descent method (Algorithm \ref{['alg:gradient descent']}) for varying $\varepsilon$-sensitivity parameters.
Figure 5: Top: Transient performance at the first iteration, under performative prediction (PP), distributionally robust performative prediction (DR-PP), and a static baseline. "Pre" and "post" denote performance before and after the model adapts to the shifted distribution. Bottom: Steady-state performance at equilibrium. The plot contrasts the converged solutions of PP and DR-PP against the fixed static classifier.
...and 1 more figures

Theorems & Definitions (11)

Lemma 1
Lemma 2
Lemma 3
Proposition 1
Theorem 1
Remark 1
Theorem 2
Remark 2
Lemma 4
Theorem 3
...and 1 more

Wasserstein Distributionally Robust Performative Prediction

TL;DR

Abstract

Wasserstein Distributionally Robust Performative Prediction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (11)