Table of Contents
Fetching ...

Dissecting Performative Prediction: A Comprehensive Survey

Thomas Kehrenberg, Javier Sanguino, Jose A. Lozano, Novi Quadrianto

TL;DR

This survey formalizes performative prediction through the distribution map $\mathcal{D}(\theta)$, outlining how deploying a predictor shifts the environment and creates a feedback loop. It defines two core objectives—performative stability $\theta_{PS}$ and performative optimality $\theta_{PO}$—and introduces a classification by access level to the distribution map, guiding a comprehensive review of models, fitting methods, and optimization algorithms. The paper surveys mathematical models of distribution maps (including strategic classification, base-distribution families, and transition maps), techniques to fit such maps, and algorithms (RRM, RGDescent, stochastic methods, bilevel, etc.) to reach stable or optimal points, with extensions to stateful and multi-deployer settings. It further discusses cross-pollination with adversarial robustness, algorithmic recourse, delayed impact, and fairness, and highlights practical challenges in data collection, benchmarks, and the need for standardized datasets for PP research.

Abstract

The field of performative prediction had its beginnings in 2020 with the seminal paper "Performative Prediction" by Perdomo et al., which established a novel machine learning setup where the deployment of a predictive model causes a distribution shift in the environment, which in turn causes a mismatch between the distribution expected by the predictive model and the real distribution. This shift is defined by a so-called distribution map. In the half-decade since, a literature has emerged which has, among other things, introduced new solution concepts to the original setup, extended the setup, offered new theoretical analyses, and examined the intersection of performative prediction and other established fields. In this survey, we first lay out the performative prediction setting and explain the different optimization targets: performative stability and performative optimality. We introduce a new way of classifying different performative prediction settings, based on how much information is available about the distribution map. We survey existing implementations of distribution maps and existing methods to address the problem of performative prediction, while examining different ways to categorize them. Finally, we point out known and previously unknown connections that can be drawn to other fields, in the hopes of stimulating future research.

Dissecting Performative Prediction: A Comprehensive Survey

TL;DR

This survey formalizes performative prediction through the distribution map , outlining how deploying a predictor shifts the environment and creates a feedback loop. It defines two core objectives—performative stability and performative optimality —and introduces a classification by access level to the distribution map, guiding a comprehensive review of models, fitting methods, and optimization algorithms. The paper surveys mathematical models of distribution maps (including strategic classification, base-distribution families, and transition maps), techniques to fit such maps, and algorithms (RRM, RGDescent, stochastic methods, bilevel, etc.) to reach stable or optimal points, with extensions to stateful and multi-deployer settings. It further discusses cross-pollination with adversarial robustness, algorithmic recourse, delayed impact, and fairness, and highlights practical challenges in data collection, benchmarks, and the need for standardized datasets for PP research.

Abstract

The field of performative prediction had its beginnings in 2020 with the seminal paper "Performative Prediction" by Perdomo et al., which established a novel machine learning setup where the deployment of a predictive model causes a distribution shift in the environment, which in turn causes a mismatch between the distribution expected by the predictive model and the real distribution. This shift is defined by a so-called distribution map. In the half-decade since, a literature has emerged which has, among other things, introduced new solution concepts to the original setup, extended the setup, offered new theoretical analyses, and examined the intersection of performative prediction and other established fields. In this survey, we first lay out the performative prediction setting and explain the different optimization targets: performative stability and performative optimality. We introduce a new way of classifying different performative prediction settings, based on how much information is available about the distribution map. We survey existing implementations of distribution maps and existing methods to address the problem of performative prediction, while examining different ways to categorize them. Finally, we point out known and previously unknown connections that can be drawn to other fields, in the hopes of stimulating future research.
Paper Structure (72 sections, 2 theorems, 67 equations, 5 figures)

This paper contains 72 sections, 2 theorems, 67 equations, 5 figures.

Key Result

Theorem 4.1

If the loss function $\ell(\theta, z)$ is $\gamma$-strongly convex and $\beta$-jointly smooth, then, repeated retraining defined in equation eq:iteration converges to a unique stable point as long as the $\epsilon$-sensitivity of the distribution map $\mathcal{D}(\cdot)$ satisfies $\epsilon<\frac{\g

Figures (5)

  • Figure 1: Let $\ell(z;\theta) = z\cdot (\theta^2+1)$ and $\mathcal{D}(\theta) = \mathcal{N}(\sqrt{a_1\theta + a_0}; \sigma)$ with $a_0,a_1\in\mathbb{R}$. The figure shows three examples of picking a distribution and plotting the corresponding risk: ${\mathcal{D}}(\theta_0)$, ${\mathcal{D}}(\theta_\mathit{PS})$, and ${\mathcal{D}}(\theta_\mathit{PO})$. In the first plot, a), we fix the distribution to $\mathcal{D}(\theta_0)$ where $\theta_0$ is an arbitrary starting point. To show the performative risk, we mark $\mathrm{Risk}(\theta_0, \mathcal{D}(\theta_0))$. We see that $\theta_0$ not stable, as it is not the minimum of the risk for the distribution that $\theta_0$ induces. The second plot, b), shows the risk for a stable point $\theta_\mathit{PS}$. We can certify that it is stable, because for the given distribution $\mathcal{D}(\theta_\mathit{PS})$, it achieves the lowest risk. In general, there can be more than one stable point. Finally, plot c) shows the optimal point, $\theta_\mathit{PO}$. We cannot tell from the plot that this is the optimal point (see Fig. \ref{['fig:opt']} for a different plot where we are able to tell). As the plot shows, the optimal point does not necessarily produce the lowest risk for its own distribution $\mathcal{D}(\theta_\mathit{PO})$, but it produces the lowest performative risk of all $\theta\in\Theta$. In other words, the optimal point need not to be stable.
  • Figure 2: Figure a) shows a contour plot (elevation view) of the 3D surface corresponding to stacking all possible risk curves, $Risk(\theta,\mathcal{D}(\theta')), \forall\theta,\theta'\in\Theta$, for the example used in Fig. \ref{['fig:opt-vs-stable']}. Note that Fig. \ref{['fig:opt-vs-stable']} shows three instances of such curves; the corresponding section of the surface of these instances is marked in the contour plot. The optimal point is the minimum of the section where $\theta=\theta'$ (marked with a dotted line) i.e. $\theta_{PO}=\mathop{\mathrm{arg\,min}}\limits_{\theta \in \Theta} PR(\theta) =\mathop{\mathrm{arg\,min}}\limits_{\theta \in \Theta} \mathbb{E}_{z \sim \mathcal{D}(\theta)}[\ell(\theta;z)]$. The corresponding performative risk with its minimum is shown in b). The difference between plot c) in Fig. \ref{['fig:opt-vs-stable']} and plot b) in this figure is that the former shows the risk for the fixed distribution $\mathcal{D}(\theta_\mathit{PO})$, whereas the latter shows the performative risk, $\mathit{PR}$.
  • Figure 3: Training a model in the PP framework involves first collecting information on the distribution map. In the simplest case, we just collect samples and train the model on the samples. However, some methods for addressing performative prediction need more than samples: they need a mathematical model of the performative mechanism or of the whole distribution map (see the beginning of Section \ref{['sec:opt-perf-opt']} for an explanation). To go from samples to a mathematical model of the distribution map, an estimation mechanism (discussed in Section \ref{['ssec:distribution-map-model-fitting']}) can be used, which, however, incurs an estimation error. With a mathematical model of the distribution map, it is possible to apply optimization algorithms to equation \ref{['eq:risk']} when $Q$ depends on $\theta$. We discuss algorithms to find the stable point and optimal point in Sections \ref{['sec:opt-perf-stable']} and \ref{['sec:opt-perf-opt']} respectively. Once the model is trained, it is deployed, causing a distribution shift and the necessity of retraining the model.
  • Figure 4: A diagram and a table of different sub-families of the base-distribution family (Section \ref{['ssec:base-distribution-with-model-dependent-transformation']}): Strategic classification (Section \ref{['sssec:strategic-classification']}), location-scale family (Section \ref{['sssec:location-scale-families']}), and (reparameterizable) functional forms (Section \ref{['sssec:arbitrary-mechanisms-specified-in-functional-form']}). The Venn diagram on the left shows how these sub-families can overlap and on the right, there are examples for each distinct region in the Venn diagram. An overlap of the first circle (strategic classification) and the third circle (reparameterizable functional form) is also conceivable --- for example a strategic classification setup with a synthetic base distribution --- but does not represent an instructive category.
  • Figure 5: The access pyramid establishes what methods can be applied depending on the level of access to the distribution map, as defined in Section \ref{['sssec:levels-of-access']}. Methods that only require a lower level can also be applied when a higher level of access is available: if we, for example, have access to the model of the dynamics, then we can also produce samples. Working on level 2 usually requires fitting a mathematical model to collected samples, as described in Section \ref{['ssec:distribution-map-model-fitting']}.

Theorems & Definitions (5)

  • Theorem 4.1
  • Definition 4.1: Sensitivity
  • Theorem 4.2
  • Definition 5.1: Mixture dominance
  • Definition A.1: Joint smoothness