What Do Temporal Graph Learning Models Learn?
Abigail J. Hayes, Tobias Schumacher, Markus Strohmaier
TL;DR
The paper tackles the reliability and interpretability of temporal-graph benchmarks by asking what eight intuitive properties dynamic models actually learn. It introduces a property-based evaluation framework and systematically tests seven models on synthetic and real datasets, revealing a mixed picture: models reliably learn some mechanisms like preferential attachment but struggle with edge direction, density, and recency, and only a subset capture persistence or periodicity. The findings highlight fundamental limitations in current temporal graph learners and motivate interpretability-driven evaluations and targeted model improvements. Practically, the work guides practitioners in selecting and calibrating models for tasks where specific temporal properties matter and suggests directions for developing models that better capture neglected dynamics.
Abstract
Learning on temporal graphs has become a central topic in graph representation learning, with numerous benchmarks indicating the strong performance of state-of-the-art models. However, recent work has raised concerns about the reliability of benchmark results, noting issues with commonly used evaluation protocols and the surprising competitiveness of simple heuristics. This contrast raises the question of which properties of the underlying graphs temporal graph learning models actually use to form their predictions. We address this by systematically evaluating seven models on their ability to capture eight fundamental attributes related to the link structure of temporal graphs. These include structural characteristics such as density, temporal patterns such as recency, and edge formation mechanisms such as homophily. Using both synthetic and real-world datasets, we analyze how well models learn these attributes. Our findings reveal a mixed picture: models capture some attributes well but fail to reproduce others. With this, we expose important limitations. Overall, we believe that our results provide practical insights for the application of temporal graph learning models, and motivate more interpretability-driven evaluations in temporal graph learning research.
