Temporal-Aware Evaluation and Learning for Temporal Graph Neural Networks
Junwei Su, Shan Wu
TL;DR
Temporal Graph Neural Networks (TGNNs) achieve strong performance on dynamic graphs, but existing evaluation metrics fail to capture temporal error structure, notably volatility clustering. The authors formalize metric expressiveness, prove the inadequacy of instance-based metrics, and introduce Volatility-Cluster Statistics (VCS) to detect temporal error clustering, along with Volatility-Cluster-Aware (VCA) learning to regularize models toward more uniform error distributions. They validate the approach across five datasets and six state-of-the-art TGNNs, showing that volatility patterns vary by architecture and that VCA reduces volatility clusters with manageable impact on predictive accuracy. This work enables temporally robust evaluation and training of TGNNs, with practical implications for fault-tolerant and real-time systems that depend on stable error dynamics.
Abstract
Temporal Graph Neural Networks (TGNNs) are a family of graph neural networks designed to model and learn dynamic information from temporal graphs. Given their substantial empirical success, there is an escalating interest in TGNNs within the research community. However, the majority of these efforts have been channelled towards algorithm and system design, with the evaluation metrics receiving comparatively less attention. Effective evaluation metrics are crucial for providing detailed performance insights, particularly in the temporal domain. This paper investigates the commonly used evaluation metrics for TGNNs and illustrates the failure mechanisms of these metrics in capturing essential temporal structures in the predictive behaviour of TGNNs. We provide a mathematical formulation of existing performance metrics and utilize an instance-based study to underscore their inadequacies in identifying volatility clustering (the occurrence of emerging errors within a brief interval). This phenomenon has profound implications for both algorithm and system design in the temporal domain. To address this deficiency, we introduce a new volatility-aware evaluation metric (termed volatility cluster statistics), designed for a more refined analysis of model temporal performance. Additionally, we demonstrate how this metric can serve as a temporal-volatility-aware training objective to alleviate the clustering of temporal errors. Through comprehensive experiments on various TGNN models, we validate our analysis and the proposed approach. The empirical results offer revealing insights: 1) existing TGNNs are prone to making errors with volatility clustering, and 2) TGNNs with different mechanisms to capture temporal information exhibit distinct volatility clustering patterns. Our empirical findings demonstrate that our proposed training objective effectively reduces volatility clusters in error.
