Reducing Smoothness with Expressive Memory Enhanced Hierarchical Graph Neural Networks
Thomas Bailie, Yun Sing Koh, S. Karthik Mukkavilli, Varvara Vetrova
TL;DR
The paper tackles information loss in hierarchical graph forecasting by introducing HiGFlow, a memory-enabled framework that propagates information across multiple resolutions through learnable embedding and lifting between graph levels. Theoretical results show that linear transitions increase feature-space smoothness while nonlinear transitions reduce it, and that a persistent memory buffer non-strictly increases Weisfeiler-Lehman expressivity beyond a $1$-WL hash as depth grows. Empirically, HiGFlow achieves state-of-the-art MAE and RMSE across diverse real-world datasets, outperforming both graph-based and transformer-based baselines, with deeper variants providing additional gains and ablations confirming the utility of embeddings, memory, and multi-resolution information. The approach offers a practical mechanism to model cross-scale spatiotemporal dependencies (e.g., global weather patterns) with improved predictive accuracy and a theoretically grounded boost in representational power.
Abstract
Graphical forecasting models learn the structure of time series data via projecting onto a graph, with recent techniques capturing spatial-temporal associations between variables via edge weights. Hierarchical variants offer a distinct advantage by analysing the time series across multiple resolutions, making them particularly effective in tasks like global weather forecasting, where low-resolution variable interactions are significant. A critical challenge in hierarchical models is information loss during forward or backward passes through the hierarchy. We propose the Hierarchical Graph Flow (HiGFlow) network, which introduces a memory buffer variable of dynamic size to store previously seen information across variable resolutions. We theoretically show two key results: HiGFlow reduces smoothness when mapping onto new feature spaces in the hierarchy and non-strictly enhances the utility of message-passing by improving Weisfeiler-Lehman (WL) expressivity. Empirical results demonstrate that HiGFlow outperforms state-of-the-art baselines, including transformer models, by at least an average of 6.1% in MAE and 6.2% in RMSE. Code is available at https://github.com/TB862/ HiGFlow.git.
