Table of Contents
Fetching ...

CoRe: Coherency Regularization for Hierarchical Time Series

Rares Cristian, Pavithra Harhsa, Georgia Perakis, Brian Quanz

TL;DR

CoRe introduces a principled, network-level coherency regularization for hierarchical time series forecasting, enabling soft coherence across all levels without enforcing hard aggregation constraints. By tying coherency to the final layer through $l_c(L)=\left\|W_L-AW_L\right\|+\left\|b_L-Ab_L\right\|$ and leveraging batch normalization, the method bounds the coherency loss $c(\hat{y})=\left\|\hat{y}-A\hat{y}\right\|$ with high probability, and its effect can be traded off against traditional MSE via a tunable weight $w$. The framework extends naturally to distributional forecasts using VAEs, dropout, or other generative models, ensuring that each sample is (softly) coherent; experiments on Traffic, Labour, and Tourism show improved generalization and coherency, including in noisy or out-of-distribution settings. Overall, CoRe offers a robust, architecture-agnostic approach to coherent hierarchical forecasting with practical benefits for point and probabilistic predictions.

Abstract

Hierarchical time series forecasting presents unique challenges, particularly when dealing with noisy data that may not perfectly adhere to aggregation constraints. This paper introduces a novel approach to soft coherency in hierarchical time series forecasting using neural networks. We present a network coherency regularization method, which we denote as CoRe (Coherency Regularization), a technique that trains neural networks to produce forecasts that are inherently coherent across hierarchies, without strictly enforcing aggregation constraints. Our method offers several key advantages. (1) It provides theoretical guarantees on the coherency of forecasts, even for out-of-sample data. (2) It is adaptable to scenarios where data may contain errors or missing values, making it more robust than strict coherency methods. (3) It can be easily integrated into existing neural network architectures for time series forecasting. We demonstrate the effectiveness of our approach on multiple benchmark datasets, comparing it against state-of-the-art methods in both coherent and noisy data scenarios. Additionally, our method can be used within existing generative probabilistic forecasting frameworks to generate coherent probabilistic forecasts. Our results show improved generalization and forecast accuracy, particularly in the presence of data inconsistencies. On a variety of datasets, including both strictly hierarchically coherent and noisy data, our training method has either equal or better accuracy at all levels of the hierarchy while being strictly more coherent out-of-sample than existing soft-coherency methods.

CoRe: Coherency Regularization for Hierarchical Time Series

TL;DR

CoRe introduces a principled, network-level coherency regularization for hierarchical time series forecasting, enabling soft coherence across all levels without enforcing hard aggregation constraints. By tying coherency to the final layer through and leveraging batch normalization, the method bounds the coherency loss with high probability, and its effect can be traded off against traditional MSE via a tunable weight . The framework extends naturally to distributional forecasts using VAEs, dropout, or other generative models, ensuring that each sample is (softly) coherent; experiments on Traffic, Labour, and Tourism show improved generalization and coherency, including in noisy or out-of-distribution settings. Overall, CoRe offers a robust, architecture-agnostic approach to coherent hierarchical forecasting with practical benefits for point and probabilistic predictions.

Abstract

Hierarchical time series forecasting presents unique challenges, particularly when dealing with noisy data that may not perfectly adhere to aggregation constraints. This paper introduces a novel approach to soft coherency in hierarchical time series forecasting using neural networks. We present a network coherency regularization method, which we denote as CoRe (Coherency Regularization), a technique that trains neural networks to produce forecasts that are inherently coherent across hierarchies, without strictly enforcing aggregation constraints. Our method offers several key advantages. (1) It provides theoretical guarantees on the coherency of forecasts, even for out-of-sample data. (2) It is adaptable to scenarios where data may contain errors or missing values, making it more robust than strict coherency methods. (3) It can be easily integrated into existing neural network architectures for time series forecasting. We demonstrate the effectiveness of our approach on multiple benchmark datasets, comparing it against state-of-the-art methods in both coherent and noisy data scenarios. Additionally, our method can be used within existing generative probabilistic forecasting frameworks to generate coherent probabilistic forecasts. Our results show improved generalization and forecast accuracy, particularly in the presence of data inconsistencies. On a variety of datasets, including both strictly hierarchically coherent and noisy data, our training method has either equal or better accuracy at all levels of the hierarchy while being strictly more coherent out-of-sample than existing soft-coherency methods.

Paper Structure

This paper contains 32 sections, 1 theorem, 7 equations, 9 figures, 7 tables.

Key Result

Proposition 1

Given a batch-normalization layer is applied directly before the final layer $L$, then the network coherency regularization def:coherency-loss bounds the coherency $c(\hat{y})$ (defined in eq:data-coherency) of any output prediction $\hat{y}$ as follows. For any $\delta \geq 1$, the coherency of any with probability $\mathbb{P}\biggl(c(\hat{y}) \leq \delta \cdot l_c(L)\biggr) \geq 1 - 4\exp\left(

Figures (9)

  • Figure 1: An example time series hierarchy with three levels and eight series. For ex., the series $y_1$ at the top level is an aggregation of the lower-level series $y_2, y_3$.
  • Figure 2: Network and loss architecture. Any neural network architecture followed by final batch normalization. We denote this output by $z$ which is then passed through a final linear layer denoted by $L$ with weights $W_L$ and bias $b_L$. The final output is denoted by $\hat{y}$. We have structural coherency loss ${l}_c(L)$ in addition to traditional mean-squared loss.
  • Figure 3: VAE-based distributional forecasting for network coherency regularization.
  • Figure 4: Validation score follows trend of coherency metric rather than training score, indicating coherency is crucial for generalization.
  • Figure 5: Validation score as coherency weight increases.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Definition 1: network coherency regularization
  • Proposition 1