The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss
Rongyao Cai, Yuxi Wan, Kexin Zhang, Ming Jin, Hao Wang, Zhiqiang Ge, Daoyi Dong, Yong Liu, Qingsong Wen
TL;DR
The paper tackles a fundamental misalignment in time-series learning: point-wise losses assume i.i.d. data, neglecting causal temporal structure and covariance-stationarity. It formalizes the Expectation of Optimization Bias (EOB) as the KL divergence between the true joint distribution and the i.i.d. factorization, revealing a Paradigm Paradox where more deterministic sequences incur greater bias. Through linear (AR and MGM) and nonlinear (GMM) analyses, it derives closed-form lower bounds for non-deterministic EOB that depend on sequence length and Structural Signal-to-Noise Ratio (SSNR), and shows EOB cannot be eliminated by model capacity alone. The authors propose a principled Debiasing Program implemented via Fourier-based transforms (DFT/DWT) and a Harmonized ℓ_p norm to stabilize gradients, with extensive synthetic and real-world experiments validating reduced bias and improved long-horizon forecasting performance. This work provides a theory-driven path to debiasing time-series training and suggests practical, scalable methods for better temporal learning outcomes.
Abstract
Optimizing time series models via point-wise loss functions (e.g., MSE) relying on a flawed point-wise independent and identically distributed (i.i.d.) assumption that disregards the causal temporal structure, an issue with growing awareness yet lacking formal theoretical grounding. Focusing on the core independence issue under covariance stationarity, this paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB), formalizing it information-theoretically as the discrepancy between the true joint distribution and its flawed i.i.d. counterpart. Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function. We derive the first closed-form quantification for the non-deterministic EOB across linear and non-linear systems, and prove EOB is an intrinsic data property, governed exclusively by sequence length and our proposed Structural Signal-to-Noise Ratio (SSNR). This theoretical diagnosis motivates our principled debiasing program that eliminates the bias through sequence length reduction and structural orthogonalization. We present a concrete solution that simultaneously achieves both principles via DFT or DWT. Furthermore, a novel harmonized $\ell_p$ norm framework is proposed to rectify gradient pathologies of high-variance series. Extensive experiments validate EOB Theory's generality and the superior performance of debiasing program.
