Long Range Propagation on Continuous-Time Dynamic Graphs

Alessio Gravina; Giulio Lovisotto; Claudio Gallicchio; Davide Bacciu; Claas Grohnfeldt

Long Range Propagation on Continuous-Time Dynamic Graphs

Alessio Gravina, Giulio Lovisotto, Claudio Gallicchio, Davide Bacciu, Claas Grohnfeldt

TL;DR

This work introduces CTAN, an ODE-based, non-dissipative graph neural network designed for Continuous-Time Dynamic Graphs (C-TDGs) to enable scalable long-range information propagation. By enforcing anti-symmetric weight matrices, CTAN achieves a stable, non-dissipative diffusion whose horizon is controlled by the terminal time $t_e$ and discretized via forward Euler, yielding a multi-layer propagation that can extend beyond local neighborhoods. The authors provide theoretical conditions for space-time non-dissipation and demonstrate strong empirical performance on synthetic long-range tasks (e.g., sequence classification on temporal path graphs) and real-world benchmarks (Temporal Pascal-VOC and future link prediction across several datasets), while also offering scalable efficiency and practical code. Overall, CTAN advances long-range context modeling in C-TDGs, reducing over-squashing and enabling robust propagation of historical information across irregular event streams.

Abstract

Learning Continuous-Time Dynamic Graphs (C-TDGs) requires accurately modeling spatio-temporal information on streams of irregularly sampled events. While many methods have been proposed recently, we find that most message passing-, recurrent- or self-attention-based methods perform poorly on long-range tasks. These tasks require correlating information that occurred "far" away from the current event, either spatially (higher-order node information) or along the time dimension (events occurred in the past). To address long-range dependencies, we introduce Continuous-Time Graph Anti-Symmetric Network (CTAN). Grounded within the ordinary differential equations framework, our method is designed for efficient propagation of information. In this paper, we show how CTAN's (i) long-range modeling capabilities are substantiated by theoretical findings and how (ii) its empirical performance on synthetic long-range benchmarks and real-world benchmarks is superior to other methods. Our results motivate CTAN's ability to propagate long-range information in C-TDGs as well as the inclusion of long-range tasks as part of temporal graph models evaluation.

Long Range Propagation on Continuous-Time Dynamic Graphs

TL;DR

and discretized via forward Euler, yielding a multi-layer propagation that can extend beyond local neighborhoods. The authors provide theoretical conditions for space-time non-dissipation and demonstrate strong empirical performance on synthetic long-range tasks (e.g., sequence classification on temporal path graphs) and real-world benchmarks (Temporal Pascal-VOC and future link prediction across several datasets), while also offering scalable efficiency and practical code. Overall, CTAN advances long-range context modeling in C-TDGs, reducing over-squashing and enabling robust propagation of historical information across irregular event streams.

Abstract

Paper Structure (18 sections, 2 theorems, 12 equations, 6 figures, 11 tables)

This paper contains 18 sections, 2 theorems, 12 equations, 6 figures, 11 tables.

Introduction
Preliminaries
Continuous-Time Graph Anti-Symmetric Network (CTAN)
Experiments
Long Range Tasks
Sequence Classification on Temporal Path Graph
Classification on Temporal Pascal-VOC
Future Link Prediction Task
Related Work
Conclusion
Non-dissipativeness Over Time
Stability of the Forward Euler's Method
Summary of CTAN's Propagation Capacity
Datasets Description and Statistics
Explored Hyper-Parameter Space
...and 3 more sections

Key Result

Proposition 3.3

Provided that the weight matrix $\mathbf{W_t}$ is anti-symmetricA matrix $\mathbf{M}\in\mathbb{R}^{d\times d}$ is anti-symmetric if $\mathbf{M}^\top = -\mathbf{M}$. and the aggregation function $\Phi$ does not depend on $\mathbf{h}_u(t)$, then the ODE in Eq. eq:our_ode_time is stable and non-dissipa

Figures (6)

Figure 1: The evolution of a Continuous-Time Dynamic Graph through the stream of events up to timestamp $t_4$. At each timestamp, the faded portion of the graph corresponds to historical information.
Figure 2: A high-level overview of the proposed framework illustrated for the $i$-th Cauchy sub-problem. On the left, we depict the propagation of the information of event $o_i$ through the graph. The faded portion of the graph corresponds to historical information, while the rest is the incoming event. On the right, we illustrate the evolution of node states given the propagation of the incoming event. Specifically, the top right shows the evolution as an ODE, $f_\theta$, that computes the node representation for a node $k$, $\mathbf{h}_k(t)$. Such computation is subject to an initial condition $\mathbf{h}_k(t_s)=\psi(\mathbf{h}_k^{i-1}(t_e), \mathbf{x}_k(i))$ that includes the node representations computed in the previous sub-problem $\mathbf{h}_k^{i-1}(t_e)$ and the current node input state. In the bottom right, the discretized solution of the ODE is computed as iterative steps of the method over a discrete set of points in the time interval $[t_s, t_e]$.
Figure 3: The illustration of the sequence classification task on a temporal path graph consisting of 5 nodes. The first node (colored in orange) has an initial feature that can be either $1$ or $-1$. All the other nodes and edges have a feature set to random value sampled uniformly in $[-1, 1]$. At the end of the sequence, the representation computed for the last node (colored in red) is used to predict the original value of the first node. At each timestamp, the faded portion of the graph corresponds to historical information.
Figure 4: Construction of the Temporal PascalVOC-SP dataset. The SLIC algorithm extracts patches from an image. We create the rag-boundary graph connecting neighboring patches based on spatial closeness. We construct a temporal graph by traversing from the topleftmost node with BFS. The goal of the task is to predict the class of the destination node at each visited edge - in the figure, either 'potted plant' (red) or 'background' (blue). For clarity in this visualization, the compactness of the SLIC algorithm is low.
Figure 5: Average time per epoch (measured in seconds) and std with respect to the embedding size computed on the Wikipedia dataset, averaged over 10 epochs. The experiments were carried out on an Intel(R) Xeon(R) Gold 6278C CPU @ 2.60GHz. On the left (a), each model has 1 DGN layer (when possible), while on the right (b) the models have 5 GCLs.
...and 1 more figures

Theorems & Definitions (5)

Definition 3.1: Non-dissipativeness over space
Definition 3.2: Non-dissipativeness over time
Proposition 3.3
Proposition 1.1
proof

Long Range Propagation on Continuous-Time Dynamic Graphs

TL;DR

Abstract

Long Range Propagation on Continuous-Time Dynamic Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (5)