Table of Contents
Fetching ...

Bundle Neural Networks for message diffusion on graphs

Jacob Bamberger, Federico Barbero, Xiaowen Dong, Michael M. Bronstein

TL;DR

BuNNs address fundamental limits of local message passing in GNNs by replacing neighbor-centric messaging with global diffusion of features over flat vector bundles via the bundle heat kernel $\mathcal{H}_\mathcal{B}(t) = \exp(-t\,\mathbf{\mathcal{L}}_\mathcal{B})$. The model learns per-node orthogonal maps and propagates information through diffusion, mitigating over-smoothing and over-squashing while enabling long-range interactions. The authors prove a compact uniform universal approximation guarantee under injective positional encodings and demonstrate state-of-the-art results on heterophilic and long-range graph benchmarks, including Peptides-func and LRBG, establishing BuNNs as a scalable, theory-backed alternative to standard MPNNs and prior diffusion-based approaches. Overall, BuNNs offer a principled diffusion-based framework that enhances expressivity and scalability for graph learning tasks.

Abstract

The dominant paradigm for learning on graph-structured data is message passing. Despite being a strong inductive bias, the local message passing mechanism suffers from pathological issues such as over-smoothing, over-squashing, and limited node-level expressivity. To address these limitations we propose Bundle Neural Networks (BuNN), a new type of GNN that operates via message diffusion over flat vector bundles - structures analogous to connections on Riemannian manifolds that augment the graph by assigning to each node a vector space and an orthogonal map. A BuNN layer evolves the features according to a diffusion-type partial differential equation. When discretized, BuNNs are a special case of Sheaf Neural Networks (SNNs), a recently proposed MPNN capable of mitigating over-smoothing. The continuous nature of message diffusion enables BuNNs to operate on larger scales of the graph and, therefore, to mitigate over-squashing. Finally, we prove that BuNN can approximate any feature transformation over nodes on any (potentially infinite) family of graphs given injective positional encodings, resulting in universal node-level expressivity. We support our theory via synthetic experiments and showcase the strong empirical performance of BuNNs over a range of real-world tasks, achieving state-of-the-art results on several standard benchmarks in transductive and inductive settings.

Bundle Neural Networks for message diffusion on graphs

TL;DR

BuNNs address fundamental limits of local message passing in GNNs by replacing neighbor-centric messaging with global diffusion of features over flat vector bundles via the bundle heat kernel . The model learns per-node orthogonal maps and propagates information through diffusion, mitigating over-smoothing and over-squashing while enabling long-range interactions. The authors prove a compact uniform universal approximation guarantee under injective positional encodings and demonstrate state-of-the-art results on heterophilic and long-range graph benchmarks, including Peptides-func and LRBG, establishing BuNNs as a scalable, theory-backed alternative to standard MPNNs and prior diffusion-based approaches. Overall, BuNNs offer a principled diffusion-based framework that enhances expressivity and scalability for graph learning tasks.

Abstract

The dominant paradigm for learning on graph-structured data is message passing. Despite being a strong inductive bias, the local message passing mechanism suffers from pathological issues such as over-smoothing, over-squashing, and limited node-level expressivity. To address these limitations we propose Bundle Neural Networks (BuNN), a new type of GNN that operates via message diffusion over flat vector bundles - structures analogous to connections on Riemannian manifolds that augment the graph by assigning to each node a vector space and an orthogonal map. A BuNN layer evolves the features according to a diffusion-type partial differential equation. When discretized, BuNNs are a special case of Sheaf Neural Networks (SNNs), a recently proposed MPNN capable of mitigating over-smoothing. The continuous nature of message diffusion enables BuNNs to operate on larger scales of the graph and, therefore, to mitigate over-squashing. Finally, we prove that BuNN can approximate any feature transformation over nodes on any (potentially infinite) family of graphs given injective positional encodings, resulting in universal node-level expressivity. We support our theory via synthetic experiments and showcase the strong empirical performance of BuNNs over a range of real-world tasks, achieving state-of-the-art results on several standard benchmarks in transductive and inductive settings.
Paper Structure (30 sections, 8 theorems, 16 equations, 5 figures, 13 tables, 2 algorithms)

This paper contains 30 sections, 8 theorems, 16 equations, 5 figures, 13 tables, 2 algorithms.

Key Result

Lemma 3.1

For every node $v$, the solution at time $t$ of the heat equation on a connected bundle $\mathsf{G} = \left(\mathsf{V},\ \mathsf{E},\ \mathbf{O} \right)$ with input node features $\mathbf{X}$ satisfies: where $\mathcal{H}(t)$ is the standard graph heat kernel, and $\mathcal{H}(t,\ v, \ u)\in \mathbb{R}$ its the entry at $(v, \ u)$.

Figures (5)

  • Figure 1: Local message passing on graphs versus global message diffusion on bundles.
  • Figure 2: Comparison of different Laplacian and their actions on signals.
  • Figure 3: Example of the message diffusion framework on a graph with $4$ nodes and $4$ edges. From left to right: The input is a simple graph embedding with each color representing the feature vector at that node. (1) An orthogonal map is computed for each node in the graph by embedding the nodes in a continuous manifold with local reference frames (represented as a torus for visual aid), the features represented as colored vectors do not change. (2) The features are updated using learnable parameters $\mathbf{W}$. (3) The features are diffused for some time $t$ according to the heat equation on the manifold: a larger value of $t$ leads to a higher synchronization between all nodes as illustrated by the alignment of node features with respect to their local coordinates. (4) The output embedding is obtained by discarding the local coordinates and applying a non-linearity.
  • Figure 4: Synthetic over-squashing (left) and over-smoothing (right). In both cases, blue nodes output the average over the red nodes and vice-versa.
  • Figure 5: Results for Tree-NeighborsMatch task.

Theorems & Definitions (21)

  • Lemma 3.1
  • Proposition 4.1
  • Proposition 4.2
  • Lemma 4.3
  • Corollary 4.4
  • Definition 5.1
  • Proposition 5.2
  • Theorem 5.3
  • proof
  • proof
  • ...and 11 more