Table of Contents
Fetching ...

EasyDGL: Encode, Train and Interpret for Continuous-time Dynamic Graph Learning

Chao Chen, Haoyu Geng, Nianzu Yang, Xiaokang Yang, Junchi Yan

TL;DR

This work presents EasyDGL, a unified pipeline for continuous-time dynamic graph learning with three modules: encoding, training, and interpreting. It introduces a TPP-modulated Attention-Intensity-Attention encoder that captures entangled spatiotemporal dynamics, a principled training scheme combining TPP posterior maximization ($TPPLE$) with Correlation-adjusted Masking ($CaM$) to support multiple tasks, and a scalable spectral interpretation framework based on an orthogonalized graph Laplacian decomposition and spectral perturbations. The approach yields strong empirical gains across dynamic link prediction, dynamic node classification, and node traffic forecasting, and provides model-level insights into the role of frequency content in predictions. By enabling global spectral explanations and efficient computation on large graphs, EasyDGL offers both improved accuracy and interpretable understanding of continuous-time dynamic graphs with edge-addition events.

Abstract

Dynamic graphs arise in various real-world applications, and it is often welcomed to model the dynamics directly in continuous time domain for its flexibility. This paper aims to design an easy-to-use pipeline (termed as EasyDGL which is also due to its implementation by DGL toolkit) composed of three key modules with both strong fitting ability and interpretability. Specifically the proposed pipeline which involves encoding, training and interpreting: i) a temporal point process (TPP) modulated attention architecture to endow the continuous-time resolution with the coupled spatiotemporal dynamics of the observed graph with edge-addition events; ii) a principled loss composed of task-agnostic TPP posterior maximization based on observed events on the graph, and a task-aware loss with a masking strategy over dynamic graph, where the covered tasks include dynamic link prediction, dynamic node classification and node traffic forecasting; iii) interpretation of the model outputs (e.g., representations and predictions) with scalable perturbation-based quantitative analysis in the graph Fourier domain, which could more comprehensively reflect the behavior of the learned model. Extensive experimental results on public benchmarks show the superior performance of our EasyDGL for time-conditioned predictive tasks, and in particular demonstrate that EasyDGL can effectively quantify the predictive power of frequency content that a model learn from the evolving graph data.

EasyDGL: Encode, Train and Interpret for Continuous-time Dynamic Graph Learning

TL;DR

This work presents EasyDGL, a unified pipeline for continuous-time dynamic graph learning with three modules: encoding, training, and interpreting. It introduces a TPP-modulated Attention-Intensity-Attention encoder that captures entangled spatiotemporal dynamics, a principled training scheme combining TPP posterior maximization () with Correlation-adjusted Masking () to support multiple tasks, and a scalable spectral interpretation framework based on an orthogonalized graph Laplacian decomposition and spectral perturbations. The approach yields strong empirical gains across dynamic link prediction, dynamic node classification, and node traffic forecasting, and provides model-level insights into the role of frequency content in predictions. By enabling global spectral explanations and efficient computation on large graphs, EasyDGL offers both improved accuracy and interpretable understanding of continuous-time dynamic graphs with edge-addition events.

Abstract

Dynamic graphs arise in various real-world applications, and it is often welcomed to model the dynamics directly in continuous time domain for its flexibility. This paper aims to design an easy-to-use pipeline (termed as EasyDGL which is also due to its implementation by DGL toolkit) composed of three key modules with both strong fitting ability and interpretability. Specifically the proposed pipeline which involves encoding, training and interpreting: i) a temporal point process (TPP) modulated attention architecture to endow the continuous-time resolution with the coupled spatiotemporal dynamics of the observed graph with edge-addition events; ii) a principled loss composed of task-agnostic TPP posterior maximization based on observed events on the graph, and a task-aware loss with a masking strategy over dynamic graph, where the covered tasks include dynamic link prediction, dynamic node classification and node traffic forecasting; iii) interpretation of the model outputs (e.g., representations and predictions) with scalable perturbation-based quantitative analysis in the graph Fourier domain, which could more comprehensively reflect the behavior of the learned model. Extensive experimental results on public benchmarks show the superior performance of our EasyDGL for time-conditioned predictive tasks, and in particular demonstrate that EasyDGL can effectively quantify the predictive power of frequency content that a model learn from the evolving graph data.
Paper Structure (57 sections, 8 theorems, 56 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 57 sections, 8 theorems, 56 equations, 9 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

Given $\textit{perturb.}~\hat{\mathbf{y}}$ which is perturbed by Eq. eq:pertb, its Fourier transform does not have support in $\mathcal{S}$, i.e.,

Figures (9)

  • Figure 1: Example of dynamic graph where time $t_i$ corresponds to an event of adding edge $(B,C)$. It shows this event changes the graph structures and the node attributes: the connection between node $B$ and node $C$ is changed and the attribute value of node $C$ (in blue) drops significantly. Also, the attribute of node $C$ shows stable trend before the addition of edge $(B,C)$ at time $t_i$, then exhibits upward trend until the event of edge $(A,B)$ emerging at time $t_k$. This suggests that the structural and temporal dynamics of the observed graph can be entangled.
  • Figure 2: EasyDGL (encode-train-interpret) pipeline for continuous-time graph. (a) We present the attention-intensity-attention architecture to encode the graph with its event history into node embeddings, for example $\mathbf{h}^{(t)}_{4;l}$ of node $v_4$ which is outputted from the $l^\mathrm{th}$ layer and will be used as input to predict what will happen at a given time $t$ in future. In this example, there are four events of edge addition occurred at time $t_0, t_1, t_2, t_3$ respectively. The TPP intensity function is adopted to model the dynamics of events, which is then used to modulate attention networks on graph (see Sec. \ref{['sec:encode']}). Specifically, the attention (gray colored) encodes endogenous dynamics within neighborhood into $\mathbf{s}^{(t)}_{4;l}$, then it will be coupled with exogenous temporal dynamics by the intensities $\lambda^\ast_{k_1}(t),\lambda^\ast_{k_2}(t),\lambda^\ast_{k_3}(t)$ (blue colored) that specify the occurrence of events on edges $(v_1,v_4),(v_2,v_4),(v_3,v_4)$ at time $t$. This quantity that captures evolution of the graph can be used to modulate message passing of the attention (green colored); (b) We consider three popular tasks: dynamic link prediction (see Sec. \ref{['sec:lpredc']}), dynamic node classification (see Sec. \ref{['sec:nclassf']}) and node traffic forecasting (see Sec. \ref{['sec:tforecast']}) under a unified masking-based training scheme; (c) Finally, perturbation-based analysis is performed to quantify the importance of frequency content that the model learns from the data (see Sec. \ref{['sec:interpret']}) in a scalable way. In particular, it enables model-level interpretation instead of instance-level interpretation as is done in most existing explainable graph learning works fan2021gcnpope2019explainabilityyuan2021explainabilityyuan2022explainability.
  • Figure 3: TPP-based interpreting of EasyDGL for dynamic link prediction on Netflix, which shows a user's intensities for different event types (or item clusters defined in Eq. \ref{['eq:scatf_point']}) over time. One can see that this user has consistent interests in movies of type-0 and-1, while the interests in movies from type-5 to type-7 are transient, i.e., $[t_{11},t_{17}]$ in one day.
  • Figure 4: Encoding and training of EasyDGL with three typical dynamic prediction tasks. The left (a) is performed in an auto-regressive manner; while the right three devised in this paper are based on auto-encoding. (a) Our conference work chen2021learning (i.e. CTSMA) tailored for dynamic link prediction; (b) Link prediction (Sec. \ref{['sec:lpredc']}): given a sequence of user behaviors $\{v_1,v_2,v_3\}$ on a user-item bipartite graph and $v_2$ is randomly selected for being masked, at the $1^\mathrm{st}$ layer the time delta as defined in Eq. \ref{['eqn:att_g']} is set to $t_1-t_0$, $t_2-t_1$ and $t_3-t_2$ respectively, when computing $\mathbf{h}^{(t_1)}_{1;1},\mathbf{h}^{(t_2)}_{2;1}$ and $\mathbf{h}^{(t_3)}_{3;1}$. Note that the output $\mathbf{h}^{(t_2)}_{2;2}$ at the $2^\mathrm{nd}$ layer has direct access to information of all three time intervals with constant path length $\mathcal{O}(1)$, superior to recursive architectures mei2017neuraltrivedi2019dyrep; (c) Node classification (Sec. \ref{['sec:nclassf']}): Let $v_6$ denote the masked query node whose input features are replaced by its true label $\mathbf{y}_6^{(t)}$, and the one-hop neighbors $v_3,v_4,v_5$ are most recently updated at time $t_1,t_3,t_4$ respectively, then the time intervals are set to $t-t_1,t-t_3,t-t_4$ when computing $\mathbf{h}_{3;1}^{(t)}, \mathbf{h}_{4:1}^{(t)},\mathbf{h}_{5;1}^{(t)}$; (d) Traffic forecasting (Sec. \ref{['sec:tforecast']}): $v_6$ denotes the masked query node whose input features are replaced by its true speed reading $\mathbf{y}_6^{(t)}$. Note that $\bar{t}_5$ ($\bar{t}_6$) represents the time point of previous traffic congestion event occurred on $v_5$ ($v_6$).
  • Figure 5: Comparison between typical masking used in tailor2021degreehou2022graphmaethakoor2022largescale and our proposed correlation-adjusted masking (CaM) on graphs, where $v_3,v_4,v_5$ connected with $v_6$ at time $t_5,t_6,t_7$ respectively and $t$ signifies the future time of interest. We add time embedding TE($t_3$), TE($t_5$) to key nodes $v_3,v_5$ and TE($t$) to query node $v_6$.
  • ...and 4 more figures

Theorems & Definitions (22)

  • Definition 1: Graph
  • Definition 2: Dynamic Graph
  • Definition 3: Continuous-time Representation
  • Definition 4: Dynamic Link Prediction
  • Definition 5: Dynamic Node Classification
  • Definition 6: Traffic Forecasting
  • Definition 7: TPP on Dynamic Graph and Events of Edge Addition
  • Definition 8: Graph Signal
  • Definition 9: Graph Fourier Transform
  • Proposition 1
  • ...and 12 more