CoRe: Coherency Regularization for Hierarchical Time Series
Rares Cristian, Pavithra Harhsa, Georgia Perakis, Brian Quanz
TL;DR
CoRe introduces a principled, network-level coherency regularization for hierarchical time series forecasting, enabling soft coherence across all levels without enforcing hard aggregation constraints. By tying coherency to the final layer through $l_c(L)=\left\|W_L-AW_L\right\|+\left\|b_L-Ab_L\right\|$ and leveraging batch normalization, the method bounds the coherency loss $c(\hat{y})=\left\|\hat{y}-A\hat{y}\right\|$ with high probability, and its effect can be traded off against traditional MSE via a tunable weight $w$. The framework extends naturally to distributional forecasts using VAEs, dropout, or other generative models, ensuring that each sample is (softly) coherent; experiments on Traffic, Labour, and Tourism show improved generalization and coherency, including in noisy or out-of-distribution settings. Overall, CoRe offers a robust, architecture-agnostic approach to coherent hierarchical forecasting with practical benefits for point and probabilistic predictions.
Abstract
Hierarchical time series forecasting presents unique challenges, particularly when dealing with noisy data that may not perfectly adhere to aggregation constraints. This paper introduces a novel approach to soft coherency in hierarchical time series forecasting using neural networks. We present a network coherency regularization method, which we denote as CoRe (Coherency Regularization), a technique that trains neural networks to produce forecasts that are inherently coherent across hierarchies, without strictly enforcing aggregation constraints. Our method offers several key advantages. (1) It provides theoretical guarantees on the coherency of forecasts, even for out-of-sample data. (2) It is adaptable to scenarios where data may contain errors or missing values, making it more robust than strict coherency methods. (3) It can be easily integrated into existing neural network architectures for time series forecasting. We demonstrate the effectiveness of our approach on multiple benchmark datasets, comparing it against state-of-the-art methods in both coherent and noisy data scenarios. Additionally, our method can be used within existing generative probabilistic forecasting frameworks to generate coherent probabilistic forecasts. Our results show improved generalization and forecast accuracy, particularly in the presence of data inconsistencies. On a variety of datasets, including both strictly hierarchically coherent and noisy data, our training method has either equal or better accuracy at all levels of the hierarchy while being strictly more coherent out-of-sample than existing soft-coherency methods.
