Adaptive Estimation and Learning under Temporal Distribution Shift
Dheeraj Baby, Yifei Tang, Hieu Duy Nguyen, Yu-Xiang Wang, Rohit Pyati
TL;DR
The paper tackles estimation and learning under temporal distribution shift by treating the problem as non-stationary signal estimation. It shows that wavelet soft-thresholding, particularly with Haar bases, yields sharp, pointwise error guarantees for the final ground-truth value without requiring prior drift information, and extends to sparsity-based bounds with higher-order wavelets. It further connects these estimation insights to learning under drift, deriving oracle-efficient ERM objectives and excess-risk bounds, and reveals a general minimax-optimality link to total-variation denoising. Empirical results on synthetic and real data corroborate the theoretical gains, and the work provides practical guidance on wavelet choice while highlighting connections to TV-denoising theory and potential extensions to change-point detection.
Abstract
In this paper, we study the problem of estimation and learning under temporal distribution shift. Consider an observation sequence of length $n$, which is a noisy realization of a time-varying groundtruth sequence. Our focus is to develop methods to estimate the groundtruth at the final time-step while providing sharp point-wise estimation error rates. We show that, without prior knowledge on the level of temporal shift, a wavelet soft-thresholding estimator provides an optimal estimation error bound for the groundtruth. Our proposed estimation method generalizes existing researches Mazzetto and Upfal (2023) by establishing a connection between the sequence's non-stationarity level and the sparsity in the wavelet-transformed domain. Our theoretical findings are validated by numerical experiments. Additionally, we applied the estimator to derive sparsity-aware excess risk bounds for binary classification under distribution shift and to develop computationally efficient training objectives. As a final contribution, we draw parallels between our results and the classical signal processing problem of total-variation denoising (Mammen and van de Geer,1997; Tibshirani, 2014), uncovering novel optimal algorithms for such task.
