Graph-based Time Series Clustering for End-to-End Hierarchical Forecasting
Andrea Cini, Danilo Mandic, Cesare Alippi
TL;DR
This work addresses the challenge of forecasting multiple related time series by unifying hierarchical constraints with relational dependencies in a single learnable framework. It introduces a pyramidal graph neural network that embeds hierarchy learning and graph pooling directly into the forecasting model, coupled with a differentiable forecast reconciliation layer to ensure coherency across levels. Key contributions include (i) embedding hierarchical constraints as inductive biases within graph-based forecasting, (ii) end-to-end learning of the hierarchy via trainable clustering and regularization, and (iii) a differentiable reconciliation mechanism that improves accuracy by projecting forecasts onto the coherent subspace. Empirical results on multiple benchmarks demonstrate state-of-the-art performance and show that the learned clusters capture meaningful, interpretable dynamics, highlighting the practical impact of jointly modeling hierarchical structure and relational dependencies in time series forecasting.
Abstract
Relationships among time series can be exploited as inductive biases in learning effective forecasting models. In hierarchical time series, relationships among subsets of sequences induce hard constraints (hierarchical inductive biases) on the predicted values. In this paper, we propose a graph-based methodology to unify relational and hierarchical inductive biases in the context of deep learning for time series forecasting. In particular, we model both types of relationships as dependencies in a pyramidal graph structure, with each pyramidal layer corresponding to a level of the hierarchy. By exploiting modern - trainable - graph pooling operators we show that the hierarchical structure, if not available as a prior, can be learned directly from data, thus obtaining cluster assignments aligned with the forecasting objective. A differentiable reconciliation stage is incorporated into the processing architecture, allowing hierarchical constraints to act both as an architectural bias as well as a regularization element for predictions. Simulation results on representative datasets show that the proposed method compares favorably against the state of the art.
