Table of Contents
Fetching ...

When to retrain a machine learning model

Regol Florence, Schwinn Leo, Sprague Kyle, Coates Mark, Markovich Thomas

TL;DR

This work tackles the practical problem of when to retrain machine learning models in the presence of continuous data drift, balancing retraining costs against future performance. It introduces UPF, an uncertainty-based retraining framework that forecasts future model performance using a Beta-distributed performance model, approximated by a Gaussian for tractable learning, and makes cost-aware decisions via a quantile-based rule. The approach rests on a principled objective that combines retraining costs and horizon-long performance, with theoretical bounds guiding retraining frequency and leveraging simple, data-efficient predictors. Empirical results on seven datasets show UPF consistently outperforms shift-detection baselines and CARA-like methods, even under mispecified retraining costs, highlighting its practical robustness and applicability in real-world, low-data settings.

Abstract

A significant challenge in maintaining real-world machine learning models is responding to the continuous and unpredictable evolution of data. Most practitioners are faced with the difficult question: when should I retrain or update my machine learning model? This seemingly straightforward problem is particularly challenging for three reasons: 1) decisions must be made based on very limited information - we usually have access to only a few examples, 2) the nature, extent, and impact of the distribution shift are unknown, and 3) it involves specifying a cost ratio between retraining and poor performance, which can be hard to characterize. Existing works address certain aspects of this problem, but none offer a comprehensive solution. Distribution shift detection falls short as it cannot account for the cost trade-off; the scarcity of the data, paired with its unusual structure, makes it a poor fit for existing offline reinforcement learning methods, and the online learning formulation overlooks key practical considerations. To address this, we present a principled formulation of the retraining problem and propose an uncertainty-based method that makes decisions by continually forecasting the evolution of model performance evaluated with a bounded metric. Our experiments addressing classification tasks show that the method consistently outperforms existing baselines on 7 datasets.

When to retrain a machine learning model

TL;DR

This work tackles the practical problem of when to retrain machine learning models in the presence of continuous data drift, balancing retraining costs against future performance. It introduces UPF, an uncertainty-based retraining framework that forecasts future model performance using a Beta-distributed performance model, approximated by a Gaussian for tractable learning, and makes cost-aware decisions via a quantile-based rule. The approach rests on a principled objective that combines retraining costs and horizon-long performance, with theoretical bounds guiding retraining frequency and leveraging simple, data-efficient predictors. Empirical results on seven datasets show UPF consistently outperforms shift-detection baselines and CARA-like methods, even under mispecified retraining costs, highlighting its practical robustness and applicability in real-world, low-data settings.

Abstract

A significant challenge in maintaining real-world machine learning models is responding to the continuous and unpredictable evolution of data. Most practitioners are faced with the difficult question: when should I retrain or update my machine learning model? This seemingly straightforward problem is particularly challenging for three reasons: 1) decisions must be made based on very limited information - we usually have access to only a few examples, 2) the nature, extent, and impact of the distribution shift are unknown, and 3) it involves specifying a cost ratio between retraining and poor performance, which can be hard to characterize. Existing works address certain aspects of this problem, but none offer a comprehensive solution. Distribution shift detection falls short as it cannot account for the cost trade-off; the scarcity of the data, paired with its unusual structure, makes it a poor fit for existing offline reinforcement learning methods, and the online learning formulation overlooks key practical considerations. To address this, we present a principled formulation of the retraining problem and propose an uncertainty-based method that makes decisions by continually forecasting the evolution of model performance evaluated with a bounded metric. Our experiments addressing classification tasks show that the method consistently outperforms existing baselines on 7 datasets.

Paper Structure

This paper contains 40 sections, 3 theorems, 69 equations, 18 figures, 12 tables.

Key Result

Proposition 3.1

Given that $L \geq | pe_{i,t} - pe_{i+1,t}|$$\forall t \in [T]$, a horizon $T \in \mathbb{N}$, and a relative cost of retrain $\alpha$, the number of retrains of the solution to Equation eq:obj$r^* \triangleq || \boldsymbol{\theta}^*||_1$ satisfies:

Figures (18)

  • Figure 1: The Retraining Problem: The performance of a model trained on a dataset $\mathcal{D}_i$ gradually decreases when evaluated on more recent datasets in the presence of distribution shift. The task is to determine when retraining is beneficial compared to keeping an older model. We must take into consideration the trade-off between potential accuracy gains and the costs associated with retraining. In the training schedule $\boldsymbol{\theta}$ shown here, retraining occurs twice, at $t=4$ and $t=8$.
  • Figure 2: Results on the electricity dataset. Top) Cost $\hat{C}_{\alpha}(\boldsymbol{\theta})$ vs $\alpha$. Bottom) Number of retrains vs $\alpha$. In the top figure, we can see that UPF consistently reaches low $\hat{C}_{\alpha}(\boldsymbol{\theta})$ across different $\alpha$. In the bottom figure, the number of retrains of UPF follows the optimal baseline more closely.
  • Figure 3: Impact of wrong $\alpha$ measured by the percentage increase of $\hat{C}_{\alpha}(\boldsymbol{\theta})$ on the epicgames dataset. left) CARA right) UPF. Overall, both methods are reasonably robust to a wrong $\alpha$ specification, with UPF being the more robust.
  • Figure 4: Results on the Gauss dataset, with the $\alpha$ values from Proposition \ref{['sec:lemma_l']} providing different upper bounds on the optimal number of retrain $r^*$. Left) Cost $\hat{C}_{\alpha}(\boldsymbol{\theta})$ vs $\alpha$. Right) Number of retrains vs $\alpha$.
  • Figure 5: Airplanes. Cost $\hat{C}_{\alpha}(\boldsymbol{\theta})$ vs $\alpha$ with the forecasting performance metrics (mae and bias).
  • ...and 13 more figures

Theorems & Definitions (3)

  • Proposition 3.1
  • Lemma 1.1
  • Theorem 1.2: Standard generalization in the Gaussian model (from schmidt2018adversarially )