Table of Contents
Fetching ...

Performative Prediction with Neural Networks

Mehrnaz Mofakhami, Ioannis Mitliagkas, Gauthier Gidel

TL;DR

This work tackles distribution shift caused by model deployment through a function-space formulation of performative prediction, where the data-generating distribution $\mathcal{D}$ depends on the predictor via $f_\theta$. By enforcing $\epsilon$-sensitivity of $\mathcal{D}$ with respect to the Pearson $\chi^2$ divergence and assuming $\gamma$-strong convexity in the predictions with bounded gradient $M$, the authors prove convergence guarantees for repeated risk minimization (RRM) to a unique performatively stable classifier when the function class $\mathcal{F}$ is convex, and to a neighborhood around such a classifier when $\mathcal{F}$ is non-convex but well-approximates the optimal predictor. A constructive Resample-if-Rejected (RIR) procedure is introduced, shown to satisfy the required distribution-map assumptions (including $\epsilon$-sensitivity with $\chi^2$ and bounded-norm ratio), and demonstrated empirically on credit-scoring data using neural networks. The results bridge the gap between theory and practice for neural predictors under performative feedback, enabling stable performance in realistic, shifting environments. $${\sqrt{C\epsilon}\,M \over \gamma} < 1$$ and related bounds quantify when convergence is guaranteed, guiding the design of robust, prediction-driven systems.

Abstract

Performative prediction is a framework for learning models that influence the data they intend to predict. We focus on finding classifiers that are performatively stable, i.e. optimal for the data distribution they induce. Standard convergence results for finding a performatively stable classifier with the method of repeated risk minimization assume that the data distribution is Lipschitz continuous to the model's parameters. Under this assumption, the loss must be strongly convex and smooth in these parameters; otherwise, the method will diverge for some problems. In this work, we instead assume that the data distribution is Lipschitz continuous with respect to the model's predictions, a more natural assumption for performative systems. As a result, we are able to significantly relax the assumptions on the loss function. In particular, we do not need to assume convexity with respect to the model's parameters. As an illustration, we introduce a resampling procedure that models realistic distribution shifts and show that it satisfies our assumptions. We support our theory by showing that one can learn performatively stable classifiers with neural networks making predictions about real data that shift according to our proposed procedure.

Performative Prediction with Neural Networks

TL;DR

This work tackles distribution shift caused by model deployment through a function-space formulation of performative prediction, where the data-generating distribution depends on the predictor via . By enforcing -sensitivity of with respect to the Pearson divergence and assuming -strong convexity in the predictions with bounded gradient , the authors prove convergence guarantees for repeated risk minimization (RRM) to a unique performatively stable classifier when the function class is convex, and to a neighborhood around such a classifier when is non-convex but well-approximates the optimal predictor. A constructive Resample-if-Rejected (RIR) procedure is introduced, shown to satisfy the required distribution-map assumptions (including -sensitivity with and bounded-norm ratio), and demonstrated empirically on credit-scoring data using neural networks. The results bridge the gap between theory and practice for neural predictors under performative feedback, enabling stable performance in realistic, shifting environments. and related bounds quantify when convergence is guaranteed, guiding the design of robust, prediction-driven systems.

Abstract

Performative prediction is a framework for learning models that influence the data they intend to predict. We focus on finding classifiers that are performatively stable, i.e. optimal for the data distribution they induce. Standard convergence results for finding a performatively stable classifier with the method of repeated risk minimization assume that the data distribution is Lipschitz continuous to the model's parameters. Under this assumption, the loss must be strongly convex and smooth in these parameters; otherwise, the method will diverge for some problems. In this work, we instead assume that the data distribution is Lipschitz continuous with respect to the model's predictions, a more natural assumption for performative systems. As a result, we are able to significantly relax the assumptions on the loss function. In particular, we do not need to assume convexity with respect to the model's parameters. As an illustration, we introduce a resampling procedure that models realistic distribution shifts and show that it satisfies our assumptions. We support our theory by showing that one can learn performatively stable classifiers with neural networks making predictions about real data that shift according to our proposed procedure.
Paper Structure (17 sections, 7 theorems, 79 equations, 3 figures)

This paper contains 17 sections, 7 theorems, 79 equations, 3 figures.

Key Result

Theorem 1

(Informal) If the loss $\ell(f_{\theta}(x),y)$ is strongly convex in $f_{\theta}(x)$ with a bounded gradient norm, and the distribution map $f_{\theta} \mapsto \mathcal{D}(f_{\theta})$ is sufficiently Lipschitz with respect to the $\chi^2$ divergence and satisfies a bounded norm ratio condition, the

Figures (3)

  • Figure 1: Evolution of log of performative risk (left) and accuracy (right) through iterations of RRM for $\delta=0.9.$ The blue lines show the changes in risk (accuracy) after optimizing on the distribution induced by the last model, and the green lines show the effect of the distribution shift on the risk (accuracy).
  • Figure 2: Evolution of log of performative risk for different values of $\delta=0.1, 0.4, 0.7, 0.9$ through iterations of RRM.
  • Figure 3: Evolution of log of performative risk through iterations of RRM for different values of hidden size $h=8,16,32$ for $\delta=0.7$ and $\delta=0.9$.

Theorems & Definitions (19)

  • Theorem 1
  • Definition 2.1: Performative Risk
  • Definition 2.2
  • Definition 2.3: RRM
  • Remark 1
  • Proposition 1
  • proof
  • Theorem 2
  • Theorem 3
  • Remark 2
  • ...and 9 more