Table of Contents
Fetching ...

Reducing Smoothness with Expressive Memory Enhanced Hierarchical Graph Neural Networks

Thomas Bailie, Yun Sing Koh, S. Karthik Mukkavilli, Varvara Vetrova

TL;DR

The paper tackles information loss in hierarchical graph forecasting by introducing HiGFlow, a memory-enabled framework that propagates information across multiple resolutions through learnable embedding and lifting between graph levels. Theoretical results show that linear transitions increase feature-space smoothness while nonlinear transitions reduce it, and that a persistent memory buffer non-strictly increases Weisfeiler-Lehman expressivity beyond a $1$-WL hash as depth grows. Empirically, HiGFlow achieves state-of-the-art MAE and RMSE across diverse real-world datasets, outperforming both graph-based and transformer-based baselines, with deeper variants providing additional gains and ablations confirming the utility of embeddings, memory, and multi-resolution information. The approach offers a practical mechanism to model cross-scale spatiotemporal dependencies (e.g., global weather patterns) with improved predictive accuracy and a theoretically grounded boost in representational power.

Abstract

Graphical forecasting models learn the structure of time series data via projecting onto a graph, with recent techniques capturing spatial-temporal associations between variables via edge weights. Hierarchical variants offer a distinct advantage by analysing the time series across multiple resolutions, making them particularly effective in tasks like global weather forecasting, where low-resolution variable interactions are significant. A critical challenge in hierarchical models is information loss during forward or backward passes through the hierarchy. We propose the Hierarchical Graph Flow (HiGFlow) network, which introduces a memory buffer variable of dynamic size to store previously seen information across variable resolutions. We theoretically show two key results: HiGFlow reduces smoothness when mapping onto new feature spaces in the hierarchy and non-strictly enhances the utility of message-passing by improving Weisfeiler-Lehman (WL) expressivity. Empirical results demonstrate that HiGFlow outperforms state-of-the-art baselines, including transformer models, by at least an average of 6.1% in MAE and 6.2% in RMSE. Code is available at https://github.com/TB862/ HiGFlow.git.

Reducing Smoothness with Expressive Memory Enhanced Hierarchical Graph Neural Networks

TL;DR

The paper tackles information loss in hierarchical graph forecasting by introducing HiGFlow, a memory-enabled framework that propagates information across multiple resolutions through learnable embedding and lifting between graph levels. Theoretical results show that linear transitions increase feature-space smoothness while nonlinear transitions reduce it, and that a persistent memory buffer non-strictly increases Weisfeiler-Lehman expressivity beyond a -WL hash as depth grows. Empirically, HiGFlow achieves state-of-the-art MAE and RMSE across diverse real-world datasets, outperforming both graph-based and transformer-based baselines, with deeper variants providing additional gains and ablations confirming the utility of embeddings, memory, and multi-resolution information. The approach offers a practical mechanism to model cross-scale spatiotemporal dependencies (e.g., global weather patterns) with improved predictive accuracy and a theoretically grounded boost in representational power.

Abstract

Graphical forecasting models learn the structure of time series data via projecting onto a graph, with recent techniques capturing spatial-temporal associations between variables via edge weights. Hierarchical variants offer a distinct advantage by analysing the time series across multiple resolutions, making them particularly effective in tasks like global weather forecasting, where low-resolution variable interactions are significant. A critical challenge in hierarchical models is information loss during forward or backward passes through the hierarchy. We propose the Hierarchical Graph Flow (HiGFlow) network, which introduces a memory buffer variable of dynamic size to store previously seen information across variable resolutions. We theoretically show two key results: HiGFlow reduces smoothness when mapping onto new feature spaces in the hierarchy and non-strictly enhances the utility of message-passing by improving Weisfeiler-Lehman (WL) expressivity. Empirical results demonstrate that HiGFlow outperforms state-of-the-art baselines, including transformer models, by at least an average of 6.1% in MAE and 6.2% in RMSE. Code is available at https://github.com/TB862/ HiGFlow.git.

Paper Structure

This paper contains 17 sections, 3 theorems, 12 equations, 5 figures, 3 tables.

Key Result

Theorem 3.4

The statistical transition function $M(C_{j,n}) = \sum_{u\in C_{j,n}} \mathbf{x}_{u,n}$ contracts the total Dirichlet energy when mapping from $\mathcal{X}(G_n)$ to $\mathcal{X}(G_m)$. In particular, for any $j \in V(G_{n})$, then it is the case that the strong condition holds:

Figures (5)

  • Figure 1: Hierarchical clustering on a graph and the corresponding relationships between regions within the graph.
  • Figure 2: The framework begins by embedding a time series onto a graph, where edges represent relative intra-series associations. Nodes are greedily clustered based on edge weights. Information at any resolution is handled by embedding cluster feature vectors into a lower-resolution space or lifting lower-resolution data upward. A memory-buffer variable, incorporating the time-series topology, iteratively builds the prediction. Finally, a neural network maps the memory buffer to the prediction horizon.
  • Figure 3: The benefit of a hierarchical framework when using a memory buffer. Hash function mappings of nodes with a cluster colour as an auxiliary argument are conditionally unique.
  • Figure 4: Model performance in terms of MAE while varying prediction buffer size.
  • Figure 5: Mean Absolute Error (MAE) as a function of hierarchy depth, showing the impact of varying the non-linearity of the embedding and lifting networks in HiGFlow.

Theorems & Definitions (6)

  • Definition 3.1: Abstract and Predecessor Graphs
  • Definition 3.2: Transition Function
  • Definition 3.3: Dirichlet Energy
  • Theorem 3.4
  • Theorem 3.5
  • Theorem 3.6