Table of Contents
Fetching ...

Fast Rates for Nonstationary Weighted Risk Minimization

Tobias Brock, Thomas Nagler

TL;DR

This work develops tight, uniform fast-rate bounds for weighted empirical risk minimization under nonstationary and dependent data. By decomposing the out-of-sample excess risk into a learning error and a drift error, the authors derive an oracle inequality that holds uniformly over weight classes and yields rate functions $r(\|w\|)$ capturing the effective sample size and dependence structure via $\beta$- and $\rho$-mixing. The main result provides high-probability bounds on the learning error that scale as $r(\|w\|)^2\log^2(1/\delta)$, with explicit rate forms under common complexity assumptions; these rates recover minimax-optimal rates (up to log factors) in unweighted stationary settings. The paper applies the theory to linear models, basis expansions, and neural networks, demonstrating sharp, dimension-adaptive rates and highlighting the impact of drift and dependence on learning performance. It further develops the technical machinery for beta-mixing concentration through coupling, discretization of weight and hypothesis classes, and analysis of several weight families relevant for practical nonstationary scenarios.

Abstract

Weighted empirical risk minimization is a common approach to prediction under distribution drift. This article studies its out-of-sample prediction error under nonstationarity. We provide a general decomposition of the excess risk into a learning term and an error term associated with distribution drift, and prove oracle inequalities for the learning error under mixing conditions. The learning bound holds uniformly over arbitrary weight classes and accounts for the effective sample size induced by the weight vector, the complexity of the weight and hypothesis classes, and potential data dependence. We illustrate the applicability and sharpness of our results in (auto-) regression problems with linear models, basis approximations, and neural networks, recovering minimax-optimal rates (up to logarithmic factors) when specialized to unweighted and stationary settings.

Fast Rates for Nonstationary Weighted Risk Minimization

TL;DR

This work develops tight, uniform fast-rate bounds for weighted empirical risk minimization under nonstationary and dependent data. By decomposing the out-of-sample excess risk into a learning error and a drift error, the authors derive an oracle inequality that holds uniformly over weight classes and yields rate functions capturing the effective sample size and dependence structure via - and -mixing. The main result provides high-probability bounds on the learning error that scale as , with explicit rate forms under common complexity assumptions; these rates recover minimax-optimal rates (up to log factors) in unweighted stationary settings. The paper applies the theory to linear models, basis expansions, and neural networks, demonstrating sharp, dimension-adaptive rates and highlighting the impact of drift and dependence on learning performance. It further develops the technical machinery for beta-mixing concentration through coupling, discretization of weight and hypothesis classes, and analysis of several weight families relevant for practical nonstationary scenarios.

Abstract

Weighted empirical risk minimization is a common approach to prediction under distribution drift. This article studies its out-of-sample prediction error under nonstationarity. We provide a general decomposition of the excess risk into a learning term and an error term associated with distribution drift, and prove oracle inequalities for the learning error under mixing conditions. The learning bound holds uniformly over arbitrary weight classes and accounts for the effective sample size induced by the weight vector, the complexity of the weight and hypothesis classes, and potential data dependence. We illustrate the applicability and sharpness of our results in (auto-) regression problems with linear models, basis approximations, and neural networks, recovering minimax-optimal rates (up to logarithmic factors) when specialized to unweighted and stationary settings.
Paper Structure (23 sections, 16 theorems, 137 equations)

This paper contains 23 sections, 16 theorems, 137 equations.

Key Result

Theorem 3

Fix $\delta\in(0,1), K \in (0, \infty)$ and assume cond:loss-L2--cond:simplify. Let $\mathcal{W}$ be a class of weight vectors and define the following quantities: Let $r\colon [C_\mathcal{W},C_1]\to[0,\infty)$ be increasing and $K$-Lipschitz, and suppose that for all $w\in \mathcal{W}$, Then, with probability at least $1-2\delta$, it holds that

Theorems & Definitions (20)

  • Example 1
  • Definition 1: $\beta$-mixing
  • Definition 2: $\rho$-mixing
  • Example 2
  • Theorem 3
  • Proposition 4
  • Corollary 5
  • Theorem 6
  • Lemma 7
  • Lemma 8
  • ...and 10 more