Table of Contents
Fetching ...

RobustTSF: Towards Theory and Design of Robust Time Series Forecasting with Anomalies

Hao Cheng, Qingsong Wen, Yang Liu, Liang Sun

TL;DR

This work defines and analyzes robust time series forecasting in the presence of anomalies (TSFA) by formalizing three anomaly types: Constant, Missing, and Gaussian. It bridges TSFA with Learning with Noisy Labels (LNL) to derive a principled RobustTSF algorithm that avoids imputation and instead selects informative samples via trend-filter-based anomaly scores and a robust loss, primarily MAE. Theoretical results show MAE offers robustness under certain anomaly patterns, while experiments across Electricity, Traffic, and other datasets demonstrate state-of-the-art performance and model-agnostic applicability. The approach yields practical robustness, efficiency, and broader implications for using LNL insights in regression-based forecasting tasks.

Abstract

Time series forecasting is an important and forefront task in many real-world applications. However, most of time series forecasting techniques assume that the training data is clean without anomalies. This assumption is unrealistic since the collected time series data can be contaminated in practice. The forecasting model will be inferior if it is directly trained by time series with anomalies. Thus it is essential to develop methods to automatically learn a robust forecasting model from the contaminated data. In this paper, we first statistically define three types of anomalies, then theoretically and experimentally analyze the loss robustness and sample robustness when these anomalies exist. Based on our analyses, we propose a simple and efficient algorithm to learn a robust forecasting model. Extensive experiments show that our method is highly robust and outperforms all existing approaches. The code is available at https://github.com/haochenglouis/RobustTSF.

RobustTSF: Towards Theory and Design of Robust Time Series Forecasting with Anomalies

TL;DR

This work defines and analyzes robust time series forecasting in the presence of anomalies (TSFA) by formalizing three anomaly types: Constant, Missing, and Gaussian. It bridges TSFA with Learning with Noisy Labels (LNL) to derive a principled RobustTSF algorithm that avoids imputation and instead selects informative samples via trend-filter-based anomaly scores and a robust loss, primarily MAE. Theoretical results show MAE offers robustness under certain anomaly patterns, while experiments across Electricity, Traffic, and other datasets demonstrate state-of-the-art performance and model-agnostic applicability. The approach yields practical robustness, efficiency, and broader implications for using LNL insights in regression-based forecasting tasks.

Abstract

Time series forecasting is an important and forefront task in many real-world applications. However, most of time series forecasting techniques assume that the training data is clean without anomalies. This assumption is unrealistic since the collected time series data can be contaminated in practice. The forecasting model will be inferior if it is directly trained by time series with anomalies. Thus it is essential to develop methods to automatically learn a robust forecasting model from the contaminated data. In this paper, we first statistically define three types of anomalies, then theoretically and experimentally analyze the loss robustness and sample robustness when these anomalies exist. Based on our analyses, we propose a simple and efficient algorithm to learn a robust forecasting model. Extensive experiments show that our method is highly robust and outperforms all existing approaches. The code is available at https://github.com/haochenglouis/RobustTSF.
Paper Structure (41 sections, 4 theorems, 16 equations, 8 figures, 30 tables, 1 algorithm)

This paper contains 41 sections, 4 theorems, 16 equations, 8 figures, 30 tables, 1 algorithm.

Key Result

Theorem 1

Let $\ell$ be the loss function and $f$ be the forecasting model. Under Constant and Missing type anomalies with anomaly rate $\eta < 0.5$, if for each ${\bm{x}}$, $\ell(f({\bm{x}}),y_{{\bm{x}}}) + \ell(f({\bm{x}}),y_{{\bm{x}}}^{A}) = C_{{\bm{x}}}$, where $C_{{\bm{x}}}$ is constant respect to the where $\gamma_1>0$ and $\gamma_2$ are constants respect to $f$.

Figures (8)

  • Figure 1: Visualization of time series with Gaussian anomalies in different positions. (a): Clean input time series which is the sine function of time steps. We normalize the time series to 0 mean and 1 std. (b) (c) (d): Time series (sine) with Gaussian anomalies located in front, middle and back of the input time series, respectively.
  • Figure 2: Visualization of time series with different types of anomalies. (a): Clean time series which is the sine function of time steps. We normalize the time series to 0 mean and 1 std. (b) (c) (d): Time series (sine) with Constant, Missing, and Gaussian type anomalies. The noise rate for these types of anomalies is 0.2. The noise scale is 1.0 for Constant and Gaussian anomaly and 0 for Missing anomaly.
  • Figure 3: (a) The forecasting result of offline imputation. (b) The forecasting result of loss-based sample selction. (c) The forecasting result of RobustTSF. The length of input time series is 16 and the forecasting horizon is 4.
  • Figure 4: Sensitivity of $\delta$ with Gaussian and Missing anomaly for offline detection-imputation-retraining pipeline.
  • Figure 5: The figure illustrates experiments concerning the hyperparameters $\tau$ and $\lambda$ for RobustTSF on both the Electricity and Traffic datasets, each containing different types of anomalies. The first row of subfigures corresponds to the Electricity dataset, while the second row corresponds to the Traffic dataset. Notably, the results indicate that RobustTSF remains robust across various settings of hyperparameters. We reiterate that we maintain consistent hyperparameters across all settings when comparing with other methods.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Proposition 1
  • Proposition 2
  • Theorem 2