When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

Ren Fujiwara; Yasuko Matsubara; Yasushi Sakurai

When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

Ren Fujiwara, Yasuko Matsubara, Yasushi Sakurai

TL;DR

CALIPER - a detector- and model-agnostic, data-only test that estimates the post-drift data size required for stable retraining and closes the gap between drift detection and data-sufficient adaptation in streaming learning.

Abstract

Sudden concept drift makes previously trained predictors unreliable, yet deciding when to retrain and what post-drift data size is sufficient is rarely addressed. We propose CALIPER - a detector- and model-agnostic, data-only test that estimates the post-drift data size required for stable retraining. CALIPER exploits state dependence in streams generated by dynamical systems: we run a single-pass weighted local regression over the post-drift window and track a one-step proxy error as a function of a locality parameter $θ$. When an effective sample size gate is satisfied, a monotonically non-increasing trend in this error with increasing a locality parameter indicates that the data size is sufficiently informative for retraining. We also provide a theoretical analysis of our method, and we show that the algorithm has a low per-update time and memory. Across datasets from four heterogeneous domains, three learner families, and two detectors, CALIPER consistently matches or exceeds the best fixed data size for retraining while incurring negligible overhead and often outperforming incremental updates. CALIPER closes the gap between drift detection and data-sufficient adaptation in streaming learning.

When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

TL;DR

Abstract

. When an effective sample size gate is satisfied, a monotonically non-increasing trend in this error with increasing a locality parameter indicates that the data size is sufficiently informative for retraining. We also provide a theoretical analysis of our method, and we show that the algorithm has a low per-update time and memory. Across datasets from four heterogeneous domains, three learner families, and two detectors, CALIPER consistently matches or exceeds the best fixed data size for retraining while incurring negligible overhead and often outperforming incremental updates. CALIPER closes the gap between drift detection and data-sufficient adaptation in streaming learning.

Paper Structure (20 sections, 1 theorem, 26 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 1 theorem, 26 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Proposed Method: Cumulative Assessment of Locality Indicator for Post-drift Estimation of Retraining-size (CALIPER)
Problem Definition
CALIPER: Cumulative Assessment of Locality Indicator for Post-drift Estimation of Retraining Data Size
Stopping Criterion
Online Estimation Algorithm (CALIPER)
Theoretical Analysis for CALIPER
Experimental Results
Related Work
Concept Drift
Analysis of Nonlinear Dynamics
Conclusion
Algorithm Overview
Additional Related Works
Limitation and Future work
...and 5 more sections

Key Result

Proposition 1

Fix a compact set $B\subset\mathbb{R}^d$ and a constant $c\ge L$ as in equation eq:alpha. Let $\Theta=\{\theta_k\}_{k=1}^K$ be CALIPER's locality grid ordered as $0<\theta_1<\cdots<\theta_K=\theta_{\max}$, and let $\{r_k\}$ be the induced effective-radius grid defined above by $r_k=r^{\mathrm{eff}}( In particular, $\mathbf{X}$ exhibits state dependence on $B$ at scale $r$ in the sense of equation

Figures (6)

Figure 1: Overview of CALIPER. (a) Unlike window-based detectors (e.g., ADWIN/KSWIN) that only indicate if/when drift occurs, CALIPER estimates how much post-drift data are needed for stable retraining. (b) State dependence: for dynamical systems $(\mathbf{x}_{t+1}=f(\mathbf{x}_t))$, nearby states exhibit similar one-step transitions; thus data sufficiency reduces to testing whether the post-drift window exhibits adequate state dependence. (c) Pipeline: a locality parameter $\theta$ reweights nearby samples in weighted local regression; when the proxy error is monotonically non-increasing as $\theta$ increases and the neighborhood is sufficiently populated, retraining is triggered. (d) Results: star markers denote CALIPER’s estimated data sizes that yield low post-drift errors across heterogeneous learners (Kernel Ridge Regression (KRR), MLP, Transformer). CALIPER selects the optimal post-drift data size—i.e., the point at which retraining would be stable—without any retraining.
Figure 2: CALIPER, a model-agnostic framework for dynamically estimating the data size required for retraining after sudden concept drift in a data stream: (i) Window normalization and split: after a drift alarm, the post-drift segment is normalized and partitioned into reference pairs $(\mathbf{X}_h, \mathbf{Y}_h)$ and a query $(\mathbf{x}_q, \mathbf{y}_q)$. (ii) ESS check: kernel weights $\mathbf{w}_\theta =\exp(-\theta\times \mathbf{r})$ at the largest $\theta$ define an effective neighborhood; proceed only if $ESS \geq C\times (d+1)$. (iii) Weighted local regression: for each $\theta$ on a fixed grid, solve the weighted normal equations to obtain $\hat{\mathbf{y}}_\theta$ and compute a proxy prediction error. (iv) Test and trigger: a monotonic non-increase of the error as $\theta$ increases, sustained for consecutive updates, indicates (a proxy for) sufficient local regularity/state dependence and triggers retraining. The rightmost panel illustrates the monotone trend of error versus $\theta$.
Figure 3: Performance of CALIPER on four datasets (MoCap, TEP, Automobile, Dysts) and three model families (KRR, MLP, Transformer). We compare fixed data sizes (128/512/2048; blue) with CALIPER (orange, “CALIPER”). Each panel reports MSE (left, “-i”) and MAE (right, “-ii”) as a function of the retraining data size after each drift detected by ADWIN (circles) or KSWIN (squares). CALIPER matches or exceeds the best fixed data size without per-dataset tuning; notably, the data size selected by CALIPER typically aligns with the dataset-specific optimal fixed data size. See Tables \ref{['tab:fullresult-KSWIN']} and \ref{['tab:fullresult-ADWIN']} for full results in Appendix \ref{['apseq: additional-result']}.
Figure 4: Per time step and average wall clock time on Dysts sequences #1 and #2 for ADWIN/KSWIN with CALIPER and fixed-size buffers (128/512/2048); curves are flat with low means, and occasional spikes reflect retraining rather than CALIPER.
Figure 5: The average per time step wall-clock time (bar chart) for MoCap, TEP, Automobile, and Dysts under ADWIN/KSWIN with CALIPER vs. fixed data size (128/512/2048) using KRR/MLP/Transformer; CALIPER matches fixed data size baselines.
...and 1 more figures

Theorems & Definitions (4)

Definition 1: Post-drift window and data size
Definition 2: State dependence
Proposition 1: CALIPER-triggered windows exhibit stronger state dependence
proof : Proof of Proposition \ref{['prop:caliper-stronger-state-dep']}

When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

TL;DR

Abstract

When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (4)