Table of Contents
Fetching ...

Adaptive Estimation and Learning under Temporal Distribution Shift

Dheeraj Baby, Yifei Tang, Hieu Duy Nguyen, Yu-Xiang Wang, Rohit Pyati

TL;DR

The paper tackles estimation and learning under temporal distribution shift by treating the problem as non-stationary signal estimation. It shows that wavelet soft-thresholding, particularly with Haar bases, yields sharp, pointwise error guarantees for the final ground-truth value without requiring prior drift information, and extends to sparsity-based bounds with higher-order wavelets. It further connects these estimation insights to learning under drift, deriving oracle-efficient ERM objectives and excess-risk bounds, and reveals a general minimax-optimality link to total-variation denoising. Empirical results on synthetic and real data corroborate the theoretical gains, and the work provides practical guidance on wavelet choice while highlighting connections to TV-denoising theory and potential extensions to change-point detection.

Abstract

In this paper, we study the problem of estimation and learning under temporal distribution shift. Consider an observation sequence of length $n$, which is a noisy realization of a time-varying groundtruth sequence. Our focus is to develop methods to estimate the groundtruth at the final time-step while providing sharp point-wise estimation error rates. We show that, without prior knowledge on the level of temporal shift, a wavelet soft-thresholding estimator provides an optimal estimation error bound for the groundtruth. Our proposed estimation method generalizes existing researches Mazzetto and Upfal (2023) by establishing a connection between the sequence's non-stationarity level and the sparsity in the wavelet-transformed domain. Our theoretical findings are validated by numerical experiments. Additionally, we applied the estimator to derive sparsity-aware excess risk bounds for binary classification under distribution shift and to develop computationally efficient training objectives. As a final contribution, we draw parallels between our results and the classical signal processing problem of total-variation denoising (Mammen and van de Geer,1997; Tibshirani, 2014), uncovering novel optimal algorithms for such task.

Adaptive Estimation and Learning under Temporal Distribution Shift

TL;DR

The paper tackles estimation and learning under temporal distribution shift by treating the problem as non-stationary signal estimation. It shows that wavelet soft-thresholding, particularly with Haar bases, yields sharp, pointwise error guarantees for the final ground-truth value without requiring prior drift information, and extends to sparsity-based bounds with higher-order wavelets. It further connects these estimation insights to learning under drift, deriving oracle-efficient ERM objectives and excess-risk bounds, and reveals a general minimax-optimality link to total-variation denoising. Empirical results on synthetic and real data corroborate the theoretical gains, and the work provides practical guidance on wavelet choice while highlighting connections to TV-denoising theory and potential extensions to change-point detection.

Abstract

In this paper, we study the problem of estimation and learning under temporal distribution shift. Consider an observation sequence of length , which is a noisy realization of a time-varying groundtruth sequence. Our focus is to develop methods to estimate the groundtruth at the final time-step while providing sharp point-wise estimation error rates. We show that, without prior knowledge on the level of temporal shift, a wavelet soft-thresholding estimator provides an optimal estimation error bound for the groundtruth. Our proposed estimation method generalizes existing researches Mazzetto and Upfal (2023) by establishing a connection between the sequence's non-stationarity level and the sparsity in the wavelet-transformed domain. Our theoretical findings are validated by numerical experiments. Additionally, we applied the estimator to derive sparsity-aware excess risk bounds for binary classification under distribution shift and to develop computationally efficient training objectives. As a final contribution, we draw parallels between our results and the classical signal processing problem of total-variation denoising (Mammen and van de Geer,1997; Tibshirani, 2014), uncovering novel optimal algorithms for such task.

Paper Structure

This paper contains 26 sections, 17 theorems, 62 equations, 5 figures, 11 tables, 1 algorithm.

Key Result

Lemma 0

Consider the observation model $y_i = \theta_i + \epsilon_i$, for $i=1,\ldots,n$ with $\epsilon_i$ being iid $\sigma$-sub-gaussian random variables. Let $\boldsymbol y := [y_n,\ldots,y_1]^T$, $\boldsymbol \theta = [\theta_n,\ldots,\theta_1]$ and $\boldsymbol W$ be an orthonormal wavelet transform ma

Figures (5)

  • Figure 1: We have access to noisy observations of a ground truth signal, and our task is to estimate the ground truth at the latest time. Although the ground truth signal appears to exhibit a non-stationary trend in the time domain, the wavelet transform reveals a sparse set of coefficients. As a result, we only need to estimate a few key coefficients, enabling wavelet denoising-based estimators to achieve sharp point-wise and data-adaptive error rates.
  • Figure 2: Mean house prices vs time in Dubai housing data (reproduced from han2024). The pricing trend is abrupt and highly non-stationary.
  • Figure 3: Daubechies (DB) wavelets with increasing number of vanishing moments. We can see that the Haar system is a special case of the DB system with 1 vanishing moment. As we increase the number of vanishing moments, the wavelets get smoother.
  • Figure 4: Groundtruth signals used for experiments.
  • Figure 5: Bound in Lemma \ref{['lem:gen-wave']} averaged across all time-stamps vs different noise levels in a semi-log plot. The groundtruth signals that are used are displayed in Fig.\ref{['fig:gt']}. We can see that using a higher order wavelet like DB8 results in significantly lower values for the bound. This provides empirical evidence for the phenomenon where Lemma \ref{['lem:gen-wave']} can lead to sharper rates than that obtained by Theorem \ref{['thm:point-wise']}. We remind the reader that the bound in Theorem \ref{['thm:point-wise']} is only an upperbound of the bound in Lemma \ref{['lem:gen-wave']} applied to Haar wavelets.

Theorems & Definitions (25)

  • Lemma 0
  • Theorem 1
  • Corollary 1
  • Lemma 2
  • Definition 4
  • Theorem 5
  • Corollary 6
  • Theorem 7
  • Corollary 7
  • Theorem 8
  • ...and 15 more