Table of Contents
Fetching ...

A Stability Principle for Learning under Non-Stationarity

Chengpiao Huang, Kaizheng Wang

TL;DR

This work tackles learning under non-stationarity by introducing SAWS, a stability-based online window-selection method that adaptively pools historical data while controlling bias relative to stochastic error. The core ideas are a novel $(\varepsilon,\delta)$-closeness measure between loss functions and a segmentation technique that partitions the non-stationary sequence into quasi-stationary pieces, enabling regret bounds that adapt to unknown non-stationarity. The authors establish minimax-optimal regret guarantees in both strongly convex and Lipschitz settings, complemented by lower bounds, and demonstrate practical effectiveness on electricity demand prediction and hospital nurse staffing. The framework generalizes beyond traditional rolling-window approaches, providing a principled, data-driven mechanism to balance bias and variance in changing environments with no prior non-stationarity information. Overall, SAWS offers a solid theoretical and empirical foundation for adaptive online learning under distribution shifts, with potential extensions to bandits and reinforcement learning.

Abstract

We develop a versatile framework for statistical learning in non-stationary environments. In each time period, our approach applies a stability principle to select a look-back window that maximizes the utilization of historical data while keeping the cumulative bias within an acceptable range relative to the stochastic error. Our theory showcases the adaptivity of this approach to unknown non-stationarity. We prove regret bounds that are minimax optimal up to logarithmic factors when the population losses are strongly convex, or Lipschitz only. At the heart of our analysis lie two novel components: a measure of similarity between functions and a segmentation technique for dividing the non-stationary data sequence into quasi-stationary pieces. We evaluate the practical performance of our approach through real-data experiments on electricity demand prediction and hospital nurse staffing.

A Stability Principle for Learning under Non-Stationarity

TL;DR

This work tackles learning under non-stationarity by introducing SAWS, a stability-based online window-selection method that adaptively pools historical data while controlling bias relative to stochastic error. The core ideas are a novel -closeness measure between loss functions and a segmentation technique that partitions the non-stationary sequence into quasi-stationary pieces, enabling regret bounds that adapt to unknown non-stationarity. The authors establish minimax-optimal regret guarantees in both strongly convex and Lipschitz settings, complemented by lower bounds, and demonstrate practical effectiveness on electricity demand prediction and hospital nurse staffing. The framework generalizes beyond traditional rolling-window approaches, providing a principled, data-driven mechanism to balance bias and variance in changing environments with no prior non-stationarity information. Overall, SAWS offers a solid theoretical and empirical foundation for adaptive online learning under distribution shifts, with potential extensions to bandits and reinforcement learning.

Abstract

We develop a versatile framework for statistical learning in non-stationary environments. In each time period, our approach applies a stability principle to select a look-back window that maximizes the utilization of historical data while keeping the cumulative bias within an acceptable range relative to the stochastic error. Our theory showcases the adaptivity of this approach to unknown non-stationarity. We prove regret bounds that are minimax optimal up to logarithmic factors when the population losses are strongly convex, or Lipschitz only. At the heart of our analysis lie two novel components: a measure of similarity between functions and a segmentation technique for dividing the non-stationary data sequence into quasi-stationary pieces. We evaluate the practical performance of our approach through real-data experiments on electricity demand prediction and hospital nurse staffing.
Paper Structure (60 sections, 29 theorems, 230 equations, 7 figures, 1 table, 5 algorithms)

This paper contains 60 sections, 29 theorems, 230 equations, 7 figures, 1 table, 5 algorithms.

Key Result

Lemma 4.1

Suppose $\{\bm{\theta}_n^*\}_{n=1}^N$ consists of $J$ quasi-stationary segments, and define $V = \sum_{n=1}^{N-1} \| \bm{\theta}_{n+1}^* - \bm{\theta}_n^* \|_2$. Then $J \le 1 + C(BN/d)^{1/3} V^{2/3}$, where $C>0$ is a constant depending on $M$, $\rho$ and $\sigma$.

Figures (7)

  • Figure 1: Visualization of segmentation for \ref{['eg-Gaussian-mean']}. Horizontal axis: time $n$. Vertical axis: values of $\theta_n^* \in \mathbb{R}$. Black curve: trajectory of $\{\theta_n^*\}_{n=1}^N$. Gray dots: samples from $N(\theta_n^*,0.01)$. Blue curve: quasi-stationary segments of $\{\theta_n^*\}_{n=1}^N$. The sequence $\{\theta_n^*\}_{n=1}^N$ is approximated by multiple constant segments, and within each segment $\theta_n^*$ only has small variations.
  • Figure 2: Several non-stationarity patterns in \ref{['eg-nonstationary-seq-strongly-cvx']}.
  • Figure 3: Log-log plots of dynamic regrets of SAWS and fixed-window benchmarks on synthetic data. Horizontal axis: time horizon $N\in\mathcal{N}$. Vertical axis: logarithm of dynamic regret $\log_2\sum_{n=1}^N[F_n(\bm{\theta}_n) - \inf_{\bm{\theta}'\in\Omega} F_n(\bm{\theta}')]$. Red circles: SAWS (\ref{['alg-online']}). Orange triangles: $\mathrm{MA}(\lceil N^{1/3} \rceil)$. Blue squares: $\mathrm{MA}(\lceil N^{1/2} \rceil)$. Purple $\times$'s: $\mathrm{MA}(\lceil N^{2/3} \rceil)$. Black $+$'s: $\mathrm{MA}(N)$.
  • Figure 4: Per-period losses of SAWS and fixed-window benchmarks on the electricity data and the ED visits data. Horizontal axis: algorithms. Vertical axis: per-period loss. For the electricity data, the predicted and true demand (unit: megawatt-hour) is scaled by $5\times 10^{-4}$.
  • Figure 5: Rolling windows of SAWS on the electricity data and the ED visits data. Horizontal axis: time period $n$. Vertical axis: endpoints of look-back windows. Lower black curve: left endpoints. Upper black curve: right endpoints ($n-1$).
  • ...and 2 more figures

Theorems & Definitions (56)

  • Example 4.1: Gaussian mean estimation
  • Example 4.2: Linear regression
  • Example 4.3: Logistic regression
  • Example 4.4: Robust linear regression
  • Definition 4.1: Segmentation
  • Lemma 4.1: From path variation to segmentation
  • Example 4.5
  • Theorem 4.1: Regret bound
  • Corollary 4.1: PV-based regret bound
  • Remark 1: Other variation metrics
  • ...and 46 more