Table of Contents
Fetching ...

Two-stage hybrid models for enhancing forecasting accuracy on heterogeneous time series

Junru Ren, Shaomin Wu

TL;DR

This paper addresses forecasting across heterogeneous time series by introducing a two-stage framework that first learns a global tsGM on the entire set to capture homogeneous patterns, then models heterogeneity in the residuals. Stage two offers two paths: Type-I adds per-series tsLMs (e.g., ARIMA) on residuals, or Type-II builds cluster-specific sub-tsGMs based on residual features (autocorrelation, nonlinearity, ARCH effects) to capture subgroup dynamics, with $R_h=\frac{n_h}{n}$ quantifying remaining heterogeneity. The authors provide theoretical insights, including a gradient-descent interpretation of stage-two improvements and a generalisation bound, and validate the approach on four open datasets, showing significant gains over six baselines and demonstrating the value of incorporating both global and heterogeneous information. The results suggest that appropriately clustering residuals and selecting between Type-I and Type-II strategies yields practical gains in forecast accuracy, albeit with higher computational costs for the more expressive Type-II variants.

Abstract

A time series forecasting model--which is typically built on a single time series--is known as a local time series model (tsLM). In contrast, a forecasting model trained on multiple time series is referred to as a global time series model (tsGM). tsGMs can enhance forecasting accuracy and improve generalisation by learning cross-series information. As such, developing tsGMs has become a prominent research focus within the time series forecasting community. However, the benefits of tsGMs may not always be realised if the given set of time series is heterogeneous. While increasing model complexity can help tsGMs adapt to such a set of data, it can also increase the risk of overfitting and forecasting error. Additionally, the definition of homogeneity remains ambiguous in the literature. To address these challenges, this paper explores how to define data heterogeneity and proposes a two-stage modelling framework: At stage one, a tsGM is learnt to identify homogeneous patterns; and at stage two, tsLMs (e.g., ARIMA) or sub-tsGMs tailored to different groups are learnt to capture the heterogeneity. Numerical experiments on four open datasets demonstrate that the proposed approach significantly outperforms six state-of-the-art models. These results highlight its effectiveness in unlocking the full potential of global forecasting models for heterogeneous datasets.

Two-stage hybrid models for enhancing forecasting accuracy on heterogeneous time series

TL;DR

This paper addresses forecasting across heterogeneous time series by introducing a two-stage framework that first learns a global tsGM on the entire set to capture homogeneous patterns, then models heterogeneity in the residuals. Stage two offers two paths: Type-I adds per-series tsLMs (e.g., ARIMA) on residuals, or Type-II builds cluster-specific sub-tsGMs based on residual features (autocorrelation, nonlinearity, ARCH effects) to capture subgroup dynamics, with quantifying remaining heterogeneity. The authors provide theoretical insights, including a gradient-descent interpretation of stage-two improvements and a generalisation bound, and validate the approach on four open datasets, showing significant gains over six baselines and demonstrating the value of incorporating both global and heterogeneous information. The results suggest that appropriately clustering residuals and selecting between Type-I and Type-II strategies yields practical gains in forecast accuracy, albeit with higher computational costs for the more expressive Type-II variants.

Abstract

A time series forecasting model--which is typically built on a single time series--is known as a local time series model (tsLM). In contrast, a forecasting model trained on multiple time series is referred to as a global time series model (tsGM). tsGMs can enhance forecasting accuracy and improve generalisation by learning cross-series information. As such, developing tsGMs has become a prominent research focus within the time series forecasting community. However, the benefits of tsGMs may not always be realised if the given set of time series is heterogeneous. While increasing model complexity can help tsGMs adapt to such a set of data, it can also increase the risk of overfitting and forecasting error. Additionally, the definition of homogeneity remains ambiguous in the literature. To address these challenges, this paper explores how to define data heterogeneity and proposes a two-stage modelling framework: At stage one, a tsGM is learnt to identify homogeneous patterns; and at stage two, tsLMs (e.g., ARIMA) or sub-tsGMs tailored to different groups are learnt to capture the heterogeneity. Numerical experiments on four open datasets demonstrate that the proposed approach significantly outperforms six state-of-the-art models. These results highlight its effectiveness in unlocking the full potential of global forecasting models for heterogeneous datasets.

Paper Structure

This paper contains 21 sections, 3 theorems, 7 equations, 5 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

Based on the tsGM trained at stage one, stage two further reduces the MSE loss defined in Equation (MSE) by descending the gradient in the function space.

Figures (5)

  • Figure 1: The two-stage modelling framework.
  • Figure 2: The CD diagram to visualise the differences among four LSTM-involved models in terms of mean sMAPE.
  • Figure 3: Different strategies to address heterogeneity.
  • Figure 4: Mean and median RMSE, MAE and sMAPE on M3-industry with different $K$ (Square markers and the solid horizontal line -- mean error).
  • Figure 5: The plots of true values and forecasts on the Tourism dataset, where the red vertical dashed line separates the training and the test datasets.

Theorems & Definitions (6)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof