The Limitations of Model Retraining in the Face of Performativity
Anmol Kabra, Kumar Kshitij Patel
TL;DR
The paper investigates how performativity—data distribution changes induced by deployed models—undermines naive retraining strategies. It formalizes Performative Risk $PR(\theta)$ and analyzes fixed-point notions: $\Theta_{\text{PS}}$ (Performatively Stable) and $\Theta_{\text{PO}}$ (Performatively Optimal), highlighting that even convex $PR$ can yield a gap between these solutions under simple linear shifts with covariance components. The authors propose Regularized Repeated Risk Minimization (Reg-R-RM) and Regularized Repeated Empirical Risk Minimization (Reg-R-ERM) to fix fixed-point discrepancies and combat finite-sample errors, showing that appropriately chosen regularization can drive convergence to $\Theta_{\text{PO}}$ and that Reg-R-ERM achieves convergence under reasonable sample schedules. These results suggest rethinking retraining in the presence of performativity, balancing data collection with regularization to obtain performatively optimal outcomes in practical settings.
Abstract
We study stochastic optimization in the context of performative shifts, where the data distribution changes in response to the deployed model. We demonstrate that naive retraining can be provably suboptimal even for simple distribution shifts. The issue worsens when models are retrained given a finite number of samples at each retraining step. We show that adding regularization to retraining corrects both of these issues, attaining provably optimal models in the face of distribution shifts. Our work advocates rethinking how machine learning models are retrained in the presence of performative effects.
