Table of Contents
Fetching ...

The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss

Rongyao Cai, Yuxi Wan, Kexin Zhang, Ming Jin, Hao Wang, Zhiqiang Ge, Daoyi Dong, Yong Liu, Qingsong Wen

TL;DR

The paper tackles a fundamental misalignment in time-series learning: point-wise losses assume i.i.d. data, neglecting causal temporal structure and covariance-stationarity. It formalizes the Expectation of Optimization Bias (EOB) as the KL divergence between the true joint distribution and the i.i.d. factorization, revealing a Paradigm Paradox where more deterministic sequences incur greater bias. Through linear (AR and MGM) and nonlinear (GMM) analyses, it derives closed-form lower bounds for non-deterministic EOB that depend on sequence length and Structural Signal-to-Noise Ratio (SSNR), and shows EOB cannot be eliminated by model capacity alone. The authors propose a principled Debiasing Program implemented via Fourier-based transforms (DFT/DWT) and a Harmonized ℓ_p norm to stabilize gradients, with extensive synthetic and real-world experiments validating reduced bias and improved long-horizon forecasting performance. This work provides a theory-driven path to debiasing time-series training and suggests practical, scalable methods for better temporal learning outcomes.

Abstract

Optimizing time series models via point-wise loss functions (e.g., MSE) relying on a flawed point-wise independent and identically distributed (i.i.d.) assumption that disregards the causal temporal structure, an issue with growing awareness yet lacking formal theoretical grounding. Focusing on the core independence issue under covariance stationarity, this paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB), formalizing it information-theoretically as the discrepancy between the true joint distribution and its flawed i.i.d. counterpart. Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function. We derive the first closed-form quantification for the non-deterministic EOB across linear and non-linear systems, and prove EOB is an intrinsic data property, governed exclusively by sequence length and our proposed Structural Signal-to-Noise Ratio (SSNR). This theoretical diagnosis motivates our principled debiasing program that eliminates the bias through sequence length reduction and structural orthogonalization. We present a concrete solution that simultaneously achieves both principles via DFT or DWT. Furthermore, a novel harmonized $\ell_p$ norm framework is proposed to rectify gradient pathologies of high-variance series. Extensive experiments validate EOB Theory's generality and the superior performance of debiasing program.

The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss

TL;DR

The paper tackles a fundamental misalignment in time-series learning: point-wise losses assume i.i.d. data, neglecting causal temporal structure and covariance-stationarity. It formalizes the Expectation of Optimization Bias (EOB) as the KL divergence between the true joint distribution and the i.i.d. factorization, revealing a Paradigm Paradox where more deterministic sequences incur greater bias. Through linear (AR and MGM) and nonlinear (GMM) analyses, it derives closed-form lower bounds for non-deterministic EOB that depend on sequence length and Structural Signal-to-Noise Ratio (SSNR), and shows EOB cannot be eliminated by model capacity alone. The authors propose a principled Debiasing Program implemented via Fourier-based transforms (DFT/DWT) and a Harmonized ℓ_p norm to stabilize gradients, with extensive synthetic and real-world experiments validating reduced bias and improved long-horizon forecasting performance. This work provides a theory-driven path to debiasing time-series training and suggests practical, scalable methods for better temporal learning outcomes.

Abstract

Optimizing time series models via point-wise loss functions (e.g., MSE) relying on a flawed point-wise independent and identically distributed (i.i.d.) assumption that disregards the causal temporal structure, an issue with growing awareness yet lacking formal theoretical grounding. Focusing on the core independence issue under covariance stationarity, this paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB), formalizing it information-theoretically as the discrepancy between the true joint distribution and its flawed i.i.d. counterpart. Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function. We derive the first closed-form quantification for the non-deterministic EOB across linear and non-linear systems, and prove EOB is an intrinsic data property, governed exclusively by sequence length and our proposed Structural Signal-to-Noise Ratio (SSNR). This theoretical diagnosis motivates our principled debiasing program that eliminates the bias through sequence length reduction and structural orthogonalization. We present a concrete solution that simultaneously achieves both principles via DFT or DWT. Furthermore, a novel harmonized norm framework is proposed to rectify gradient pathologies of high-variance series. Extensive experiments validate EOB Theory's generality and the superior performance of debiasing program.

Paper Structure

This paper contains 74 sections, 9 theorems, 113 equations, 8 figures, 5 tables.

Key Result

Theorem 2.1

(Bounds on the Expectation of Optimization Bias) Let $\{ x_t \}$ be a process admitting a Cramér decomposition $x_t = v_t + z_t$, where $\{ v_t \}$ is the deterministic component and $\{ z_t \}$ is the purely stochastic component. Assume the following conditions hold: 1. Determination: $v_t$ is perf

Figures (8)

  • Figure 1: Motivation of our work. The use of a point-wise loss function leads to the deep learning model to presume an independent and identical distribution as an approximation for the true joint distribution of data. It is biased.
  • Figure 2: Error surfaces of Transformer with Gaussian distribution innovation. The blue and red arrows indicate the surface variation trend along horizon $h$ and the total SSNR $\textit{SSNR}_x$, respectively.
  • Figure 3: Taxonomies of Time Series Analysis.
  • Figure 4: Empirical verification of EOB Theory via CNN model. The blue and red arrows indicates the surface variation trend along horizon $h$ and the total SSNR $\textit{SSNR}_x$, respectively.
  • Figure 5: Empirical verification of EOB Theory via LSTM model. The blue and red arrows indicates the surface variation trend along horizon $h$ and the total SSNR $\textit{SSNR}_x$, respectively.
  • ...and 3 more figures

Theorems & Definitions (11)

  • Theorem 2.1
  • Theorem 2.2
  • Proposition 2.3
  • Definition 2.4
  • Proposition 2.5
  • Theorem 2.6
  • Theorem 3.1
  • Definition 6.1
  • Lemma 4.1
  • Lemma 4.2
  • ...and 1 more