Model Assessment and Selection under Temporal Distribution Shift
Elise Han, Chengpiao Huang, Kaizheng Wang
TL;DR
The paper addresses assessing and selecting predictors under temporal distribution shift by introducing an adaptive rolling-window estimator for the current generalization error $L_t(f)$ and a framework for pairwise model comparisons. It then extends to multi-model selection via a single-elimination tournament, with oracle-type guarantees that adapt to unknown nonstationarity patterns. Theoretical analyses combined with experiments on synthetic and real data demonstrate the method's adaptivity, performing comparably to large fixed windows in stationary settings while outperforming small-window baselines during shifts. The work provides a practical offline toolkit for robust model evaluation and selection in evolving environments with historical data from past epochs.
Abstract
We investigate model assessment and selection in a changing environment, by synthesizing datasets from both the current time period and historical epochs. To tackle unknown and potentially arbitrary temporal distribution shift, we develop an adaptive rolling window approach to estimate the generalization error of a given model. This strategy also facilitates the comparison between any two candidate models by estimating the difference of their generalization errors. We further integrate pairwise comparisons into a single-elimination tournament, achieving near-optimal model selection from a collection of candidates. Theoretical analyses and numerical experiments demonstrate the adaptivity of our proposed methods to the non-stationarity in data.
