Table of Contents
Fetching ...

Model Assessment and Selection under Temporal Distribution Shift

Elise Han, Chengpiao Huang, Kaizheng Wang

TL;DR

The paper addresses assessing and selecting predictors under temporal distribution shift by introducing an adaptive rolling-window estimator for the current generalization error $L_t(f)$ and a framework for pairwise model comparisons. It then extends to multi-model selection via a single-elimination tournament, with oracle-type guarantees that adapt to unknown nonstationarity patterns. Theoretical analyses combined with experiments on synthetic and real data demonstrate the method's adaptivity, performing comparably to large fixed windows in stationary settings while outperforming small-window baselines during shifts. The work provides a practical offline toolkit for robust model evaluation and selection in evolving environments with historical data from past epochs.

Abstract

We investigate model assessment and selection in a changing environment, by synthesizing datasets from both the current time period and historical epochs. To tackle unknown and potentially arbitrary temporal distribution shift, we develop an adaptive rolling window approach to estimate the generalization error of a given model. This strategy also facilitates the comparison between any two candidate models by estimating the difference of their generalization errors. We further integrate pairwise comparisons into a single-elimination tournament, achieving near-optimal model selection from a collection of candidates. Theoretical analyses and numerical experiments demonstrate the adaptivity of our proposed methods to the non-stationarity in data.

Model Assessment and Selection under Temporal Distribution Shift

TL;DR

The paper addresses assessing and selecting predictors under temporal distribution shift by introducing an adaptive rolling-window estimator for the current generalization error and a framework for pairwise model comparisons. It then extends to multi-model selection via a single-elimination tournament, with oracle-type guarantees that adapt to unknown nonstationarity patterns. Theoretical analyses combined with experiments on synthetic and real data demonstrate the method's adaptivity, performing comparably to large fixed windows in stationary settings while outperforming small-window baselines during shifts. The work provides a practical offline toolkit for robust model evaluation and selection in evolving environments with historical data from past epochs.

Abstract

We investigate model assessment and selection in a changing environment, by synthesizing datasets from both the current time period and historical epochs. To tackle unknown and potentially arbitrary temporal distribution shift, we develop an adaptive rolling window approach to estimate the generalization error of a given model. This strategy also facilitates the comparison between any two candidate models by estimating the difference of their generalization errors. We further integrate pairwise comparisons into a single-elimination tournament, achieving near-optimal model selection from a collection of candidates. Theoretical analyses and numerical experiments demonstrate the adaptivity of our proposed methods to the non-stationarity in data.
Paper Structure (29 sections, 12 theorems, 76 equations, 6 figures, 4 tables, 4 algorithms)

This paper contains 29 sections, 12 theorems, 76 equations, 6 figures, 4 tables, 4 algorithms.

Key Result

Lemma 3.1

Let $\{ x_i \}_{i=1}^n$ be independent random variables taking values in $[a, b]$ almost surely. Define the average variance $\sigma^2 = \frac{1}{n} \sum_{i=1}^n \mathop{\mathrm{\rm var}}\nolimits (x_i)$. For any $\delta \in (0 , 1 )$, with probability at least $1-\delta$,

Figures (6)

  • Figure 1: True means $\{ \mu_t \}_{t=0}^{100}$ in the synthetic data.
  • Figure 2: Excess risks of different model selection methods in \ref{['eg-syn-1']}. Left: $\sigma^2 = 1$. Right: $\sigma^2 = 10$. Red: $\mathcal{V}_{\rm ARW}$. Orange: $\mathcal{V}_1$. Blue: $\mathcal{V}_{256}$.
  • Figure 3: Excess risks of different model selection methods in \ref{['eg-syn-2']}. Left: $\sigma^2 = 1$. Right: $\sigma^2 = 10$. Red: $\mathcal{V}_{\rm ARW}$. Orange: $\mathcal{V}_1$. Blue: $\mathcal{V}_{256}$.
  • Figure 4: Error curves of different model selection methods on the arXiv data. Red: $\mathcal{V}_{\rm ARW}$. Orange: $\mathcal{V}_1$. Blue: $\mathcal{V}_{256}$.
  • Figure 5: Error curves of different model selection methods on the housing data. Red: $\mathcal{V}_{\rm ARW}$. Orange: $\mathcal{V}_1$. Blue: $\mathcal{V}_{256}$.
  • ...and 1 more figures

Theorems & Definitions (20)

  • Lemma 3.1: Bernstein bound
  • Corollary 3.1
  • Lemma 3.2
  • Corollary 3.2
  • Lemma 3.3
  • Theorem 3.1: Oracle inequality
  • Example 3.1: Change point
  • Example 3.2: Bounded drift
  • Lemma 3.4
  • proof
  • ...and 10 more