A Stability Principle for Learning under Non-Stationarity

Chengpiao Huang; Kaizheng Wang

A Stability Principle for Learning under Non-Stationarity

Chengpiao Huang, Kaizheng Wang

TL;DR

This work tackles learning under non-stationarity by introducing SAWS, a stability-based online window-selection method that adaptively pools historical data while controlling bias relative to stochastic error. The core ideas are a novel $(\varepsilon,\delta)$-closeness measure between loss functions and a segmentation technique that partitions the non-stationary sequence into quasi-stationary pieces, enabling regret bounds that adapt to unknown non-stationarity. The authors establish minimax-optimal regret guarantees in both strongly convex and Lipschitz settings, complemented by lower bounds, and demonstrate practical effectiveness on electricity demand prediction and hospital nurse staffing. The framework generalizes beyond traditional rolling-window approaches, providing a principled, data-driven mechanism to balance bias and variance in changing environments with no prior non-stationarity information. Overall, SAWS offers a solid theoretical and empirical foundation for adaptive online learning under distribution shifts, with potential extensions to bandits and reinforcement learning.

Abstract

We develop a versatile framework for statistical learning in non-stationary environments. In each time period, our approach applies a stability principle to select a look-back window that maximizes the utilization of historical data while keeping the cumulative bias within an acceptable range relative to the stochastic error. Our theory showcases the adaptivity of this approach to unknown non-stationarity. We prove regret bounds that are minimax optimal up to logarithmic factors when the population losses are strongly convex, or Lipschitz only. At the heart of our analysis lie two novel components: a measure of similarity between functions and a segmentation technique for dividing the non-stationary data sequence into quasi-stationary pieces. We evaluate the practical performance of our approach through real-data experiments on electricity demand prediction and hospital nurse staffing.

A Stability Principle for Learning under Non-Stationarity

TL;DR

-closeness measure between loss functions and a segmentation technique that partitions the non-stationary sequence into quasi-stationary pieces, enabling regret bounds that adapt to unknown non-stationarity. The authors establish minimax-optimal regret guarantees in both strongly convex and Lipschitz settings, complemented by lower bounds, and demonstrate practical effectiveness on electricity demand prediction and hospital nurse staffing. The framework generalizes beyond traditional rolling-window approaches, providing a principled, data-driven mechanism to balance bias and variance in changing environments with no prior non-stationarity information. Overall, SAWS offers a solid theoretical and empirical foundation for adaptive online learning under distribution shifts, with potential extensions to bandits and reinforcement learning.

Abstract

Paper Structure (60 sections, 29 theorems, 230 equations, 7 figures, 1 table, 5 algorithms)

This paper contains 60 sections, 29 theorems, 230 equations, 7 figures, 1 table, 5 algorithms.

Introduction
Main contributions.
Related works.
Outline.
Problem Setup
Notation.
A Stability Principle for Adapting to Non-Stationarity
Choosing between Two Windows: To Pool or Not to Pool?
Choosing from Multiple Windows
Efficiency Improvements
Regret Analysis in Common Settings
Strongly Convex Population Losses
Lipschitz Population Losses
A General Theory of Learning under Non-Stationarity
Overview
...and 45 more sections

Key Result

Lemma 4.1

Suppose $\{\bm{\theta}_n^*\}_{n=1}^N$ consists of $J$ quasi-stationary segments, and define $V = \sum_{n=1}^{N-1} \| \bm{\theta}_{n+1}^* - \bm{\theta}_n^* \|_2$. Then $J \le 1 + C(BN/d)^{1/3} V^{2/3}$, where $C>0$ is a constant depending on $M$, $\rho$ and $\sigma$.

Figures (7)

Figure 1: Visualization of segmentation for \ref{['eg-Gaussian-mean']}. Horizontal axis: time $n$. Vertical axis: values of $\theta_n^* \in \mathbb{R}$. Black curve: trajectory of $\{\theta_n^*\}_{n=1}^N$. Gray dots: samples from $N(\theta_n^*,0.01)$. Blue curve: quasi-stationary segments of $\{\theta_n^*\}_{n=1}^N$. The sequence $\{\theta_n^*\}_{n=1}^N$ is approximated by multiple constant segments, and within each segment $\theta_n^*$ only has small variations.
Figure 2: Several non-stationarity patterns in \ref{['eg-nonstationary-seq-strongly-cvx']}.
Figure 3: Log-log plots of dynamic regrets of SAWS and fixed-window benchmarks on synthetic data. Horizontal axis: time horizon $N\in\mathcal{N}$. Vertical axis: logarithm of dynamic regret $\log_2\sum_{n=1}^N[F_n(\bm{\theta}_n) - \inf_{\bm{\theta}'\in\Omega} F_n(\bm{\theta}')]$. Red circles: SAWS (\ref{['alg-online']}). Orange triangles: $\mathrm{MA}(\lceil N^{1/3} \rceil)$. Blue squares: $\mathrm{MA}(\lceil N^{1/2} \rceil)$. Purple $\times$'s: $\mathrm{MA}(\lceil N^{2/3} \rceil)$. Black $+$'s: $\mathrm{MA}(N)$.
Figure 4: Per-period losses of SAWS and fixed-window benchmarks on the electricity data and the ED visits data. Horizontal axis: algorithms. Vertical axis: per-period loss. For the electricity data, the predicted and true demand (unit: megawatt-hour) is scaled by $5\times 10^{-4}$.
Figure 5: Rolling windows of SAWS on the electricity data and the ED visits data. Horizontal axis: time period $n$. Vertical axis: endpoints of look-back windows. Lower black curve: left endpoints. Upper black curve: right endpoints ($n-1$).
...and 2 more figures

Theorems & Definitions (56)

Example 4.1: Gaussian mean estimation
Example 4.2: Linear regression
Example 4.3: Logistic regression
Example 4.4: Robust linear regression
Definition 4.1: Segmentation
Lemma 4.1: From path variation to segmentation
Example 4.5
Theorem 4.1: Regret bound
Corollary 4.1: PV-based regret bound
Remark 1: Other variation metrics
...and 46 more

A Stability Principle for Learning under Non-Stationarity

TL;DR

Abstract

A Stability Principle for Learning under Non-Stationarity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (56)