Table of Contents
Fetching ...

Port-Hamiltonian Architectural Bias for Long-Range Propagation in Deep Graph Networks

Simon Heilig, Alessio Gravina, Alessandro Trenta, Claudio Gallicchio, Davide Bacciu

TL;DR

The paper addresses the challenge of long-range information diffusion in deep graph networks by introducing port-Hamiltonian Deep Graph Networks (PH-DGN), a framework that models neural information flow as port-Hamiltonian dynamics to balance conservation and dissipation. By representing node states with momentum and position and evolving them through a Hamiltonian with optional damping and external forcing, PH-DGN enables both purely conservative long-range propagation and task-driven non-conservative behavior. The authors prove energy conservation and non-vanishing gradient properties in the conservative regime, and show how dissipative components can be learned to improve performance. Empirically, PH-DGN achieves state-of-the-art results on synthetic and real-world long-range propagation tasks, including graph property prediction and the Long-Range Graph Benchmark, while maintaining competitive runtimes and offering clear interpretability from a physics perspective.

Abstract

The dynamics of information diffusion within graphs is a critical open issue that heavily influences graph representation learning, especially when considering long-range propagation. This calls for principled approaches that control and regulate the degree of propagation and dissipation of information throughout the neural flow. Motivated by this, we introduce (port-)Hamiltonian Deep Graph Networks, a novel framework that models neural information flow in graphs by building on the laws of conservation of Hamiltonian dynamical systems. We reconcile under a single theoretical and practical framework both non-dissipative long-range propagation and non-conservative behaviors, introducing tools from mechanical systems to gauge the equilibrium between the two components. Our approach can be applied to general message-passing architectures, and it provides theoretical guarantees on information conservation in time. Empirical results prove the effectiveness of our port-Hamiltonian scheme in pushing simple graph convolutional architectures to state-of-the-art performance in long-range benchmarks.

Port-Hamiltonian Architectural Bias for Long-Range Propagation in Deep Graph Networks

TL;DR

The paper addresses the challenge of long-range information diffusion in deep graph networks by introducing port-Hamiltonian Deep Graph Networks (PH-DGN), a framework that models neural information flow as port-Hamiltonian dynamics to balance conservation and dissipation. By representing node states with momentum and position and evolving them through a Hamiltonian with optional damping and external forcing, PH-DGN enables both purely conservative long-range propagation and task-driven non-conservative behavior. The authors prove energy conservation and non-vanishing gradient properties in the conservative regime, and show how dissipative components can be learned to improve performance. Empirically, PH-DGN achieves state-of-the-art results on synthetic and real-world long-range propagation tasks, including graph property prediction and the Long-Range Graph Benchmark, while maintaining competitive runtimes and offering clear interpretability from a physics perspective.

Abstract

The dynamics of information diffusion within graphs is a critical open issue that heavily influences graph representation learning, especially when considering long-range propagation. This calls for principled approaches that control and regulate the degree of propagation and dissipation of information throughout the neural flow. Motivated by this, we introduce (port-)Hamiltonian Deep Graph Networks, a novel framework that models neural information flow in graphs by building on the laws of conservation of Hamiltonian dynamical systems. We reconcile under a single theoretical and practical framework both non-dissipative long-range propagation and non-conservative behaviors, introducing tools from mechanical systems to gauge the equilibrium between the two components. Our approach can be applied to general message-passing architectures, and it provides theoretical guarantees on information conservation in time. Empirical results prove the effectiveness of our port-Hamiltonian scheme in pushing simple graph convolutional architectures to state-of-the-art performance in long-range benchmarks.
Paper Structure (37 sections, 11 theorems, 71 equations, 5 figures, 8 tables)

This paper contains 37 sections, 11 theorems, 71 equations, 5 figures, 8 tables.

Key Result

Theorem 2.1

The Jacobian matrix of the system defined by the ODE in eq:local_dyn possesses eigenvalues purely on the imaginary axis, i.e., where $\lambda_i$ represents the $i$-th eigenvalue of the Jacobian.

Figures (5)

  • Figure 1: A high-level overview of the proposed port-Hamiltonian Deep Graph Network. It summarizes how the initial node state ${\bf x}_u(0)$ is propagated by means of energy preservation up until the terminal time $T$ (i.e., layer $L$), ${\bf x}_u(T)$. While the global system's state $\mathbf{y}$ evolves preserving energy, external forces (i.e., dampening $D(\mathbf{y})$ and external control $F(\mathbf{y},t)$) can intervene to alter its conservative trajectory. The gray trajectories between the initial and final states represent the continuous evolution of the system. The discrete message passing step from layer $\ell$ to $\ell +1$, which is shown in middle of the figure, is given by the coupling of coordinates ${\bf q}$ and momenta ${\bf p}$ in terms of neighborhood aggregation $\Phi_{\mathcal{G}}$ and influence to adjacent neighbors $\Phi^*_{\mathcal{G}}$. Self-influence on both ${\bf q}$ and ${\bf p}$ from the previous step $\ell$ are omitted for simplicity.
  • Figure 2: (a) Time evolution of the energy difference to the initial state $\mathbf{y}(0) =\mathbf{y}^{0}$ obtained from one forward pass of conservative PH-DGN with fixed random weights on the Carbon-60 graph with three different numbers of layers given by $T/\epsilon$. The sensitivity $\|{\partial\mathbf{x}_u^{(L)}}/{\partial\mathbf{x}_u^{(\ell)}}\|$ of 15 different node states to their final embedding obtained by backpropagation on the Carbon-60 graph after (b) $T=10$ and $\epsilon=0.1$ (i.e., 100 layers) and (c) $T=300$ and $\epsilon=0.3$ (i.e., 1000 layers). The log scale's horizontal line at $0$ indicates the theoretical lower bound.
  • Figure 3: Information transfer performance on (a) Line, (b) Ring, and (c) Crossed-Ring graphs. Overall, baseline approaches are not able to transfer the information accurately as distance increase, while non-dissipative methods like A-DGN and our PH-DGN achieve low errors.
  • Figure 4: Three topologies for Graph Transfer. Left) Line. Center) Ring. Right) Crossed-Ring. The distance between source and target nodes is equal to 5. Nodes marked with S are source nodes, while the nodes with a T are target nodes.
  • Figure 5: Information transfer performance on (a) Line, (b) Ring, and (c) Crossed-Ring graphs, including the purely conservative PH-DGN (i.e., PH-DGNC) and PH-DGN with driving forces.

Theorems & Definitions (21)

  • Theorem 2.1
  • Theorem 2.2
  • Theorem 2.3
  • Theorem 2.4
  • Theorem A.1
  • Theorem A.2
  • proof
  • proof
  • Lemma B.1
  • proof
  • ...and 11 more