Table of Contents
Fetching ...

RI-Loss: A Learnable Residual-Informed Loss for Time Series Forecasting

Jieting Wang, Xiaolei Shang, Feijiang Li, Furong Peng

TL;DR

This work tackles the limitations of mean-squared-error losses in time-series forecasting by introducing RI-Loss, a kernel-based objective that enforces dependence between model residuals and random time-series noise using HSIC. The authors derive a non-asymptotic HSIC bound with double-sample Rademacher complexities and Bernstein-type concentration, providing rigorous generalization guarantees for the loss. Empirically, RI-Loss yields consistent improvements across eight real-world datasets and five backbone models (including Transformer and MLP architectures), while remaining competitive in runtime. The approach offers a principled, noise-aware framework for long-horizon forecasting with broad practical impact and publicly available code.

Abstract

Time series forecasting relies on predicting future values from historical data, yet most state-of-the-art approaches-including transformer and multilayer perceptron-based models-optimize using Mean Squared Error (MSE), which has two fundamental weaknesses: its point-wise error computation fails to capture temporal relationships, and it does not account for inherent noise in the data. To overcome these limitations, we introduce the Residual-Informed Loss (RI-Loss), a novel objective function based on the Hilbert-Schmidt Independence Criterion (HSIC). RI-Loss explicitly models noise structure by enforcing dependence between the residual sequence and a random time series, enabling more robust, noise-aware representations. Theoretically, we derive the first non-asymptotic HSIC bound with explicit double-sample complexity terms, achieving optimal convergence rates through Bernstein-type concentration inequalities and Rademacher complexity analysis. This provides rigorous guarantees for RI-Loss optimization while precisely quantifying kernel space interactions. Empirically, experiments across eight real-world benchmarks and five leading forecasting models demonstrate improvements in predictive performance, validating the effectiveness of our approach. The code is publicly available at: https://github.com/shang-xl/RI-Loss.

RI-Loss: A Learnable Residual-Informed Loss for Time Series Forecasting

TL;DR

This work tackles the limitations of mean-squared-error losses in time-series forecasting by introducing RI-Loss, a kernel-based objective that enforces dependence between model residuals and random time-series noise using HSIC. The authors derive a non-asymptotic HSIC bound with double-sample Rademacher complexities and Bernstein-type concentration, providing rigorous generalization guarantees for the loss. Empirically, RI-Loss yields consistent improvements across eight real-world datasets and five backbone models (including Transformer and MLP architectures), while remaining competitive in runtime. The approach offers a principled, noise-aware framework for long-horizon forecasting with broad practical impact and publicly available code.

Abstract

Time series forecasting relies on predicting future values from historical data, yet most state-of-the-art approaches-including transformer and multilayer perceptron-based models-optimize using Mean Squared Error (MSE), which has two fundamental weaknesses: its point-wise error computation fails to capture temporal relationships, and it does not account for inherent noise in the data. To overcome these limitations, we introduce the Residual-Informed Loss (RI-Loss), a novel objective function based on the Hilbert-Schmidt Independence Criterion (HSIC). RI-Loss explicitly models noise structure by enforcing dependence between the residual sequence and a random time series, enabling more robust, noise-aware representations. Theoretically, we derive the first non-asymptotic HSIC bound with explicit double-sample complexity terms, achieving optimal convergence rates through Bernstein-type concentration inequalities and Rademacher complexity analysis. This provides rigorous guarantees for RI-Loss optimization while precisely quantifying kernel space interactions. Empirically, experiments across eight real-world benchmarks and five leading forecasting models demonstrate improvements in predictive performance, validating the effectiveness of our approach. The code is publicly available at: https://github.com/shang-xl/RI-Loss.

Paper Structure

This paper contains 54 sections, 11 theorems, 56 equations, 9 figures, 10 tables, 1 algorithm.

Key Result

Theorem 1

Let $\bm{Y} = (Y_{t+1}, \dots, Y_{t+H})^\top \in \mathbb{R}^H$ be a noisy observation vector generated by: $\bm{Y} = h(\bm{X}_t) + \bm{\epsilon},$ where $h(\bm{X}_t): \mathcal{X} \to \mathbb{R}^H$ is a deterministic mapping, and the noise vector $\bm{\epsilon} \in \mathbb{R}^H$ satisfies: $\mathbb{E where $\mathrm{tr}(\cdot)$ represents the trace of a matrix, defined as the sum of its diagonal ele

Figures (9)

  • Figure 1: The RI loss varies with the noise ratio.
  • Figure 2: RI-Loss Based Time Series Model.
  • Figure 3: The CD plots on two evaluation metrics (MSE and MAE) with a significance level at $\alpha= 0.05$.
  • Figure 4: The impact of the hyperparameter $\lambda$ on iTransformer.
  • Figure 5: Forecasting visualization comparing RI-Loss and MSE loss as objective functions under the input-336-predict-336 settings. Blue lines are the ground truths and orange lines are the model predictions. Panels (a) and (b) correspond to ETTh1, with panels (c) and (d) representing ETTh2.
  • ...and 4 more figures

Theorems & Definitions (22)

  • Definition 1: Population HSIC
  • Definition 2: Empirical HSIC Estimator
  • Theorem 1: Cross-Term Expectation for Linear Projection
  • Definition 3
  • Theorem 2
  • Theorem 3: Cross-Term Expectation for Linear Projection
  • proof
  • Definition 4: Empirical Rademacher Complexity
  • Lemma 1
  • Theorem 4
  • ...and 12 more