Table of Contents
Fetching ...

Global-Lens Transformers: Adaptive Token Mixing for Dynamic Link Prediction

Tao Zou, Chengfeng Wu, Tianxi Liao, Junchen Ye, Bowen Du

TL;DR

This paper addresses scalable dynamic graph learning for temporal link prediction by questioning the necessity of self-attention. It introduces GLFormer, an attention-free Transformer-style architecture with an adaptive token mixer that uses temporal order and time gaps, and a hierarchical aggregation mechanism to capture long-range dependencies. Empirical results on six benchmarks show GLFormer achieves state-of-the-art performance with substantially lower inference cost than attention-based Transformers, validating the viability of attention-free designs for dynamic graphs. The approach offers a practical, scalable solution for real-time dynamic link prediction in large-scale, high-frequency graphs.

Abstract

Dynamic graph learning plays a pivotal role in modeling evolving relationships over time, especially for temporal link prediction tasks in domains such as traffic systems, social networks, and recommendation platforms. While Transformer-based models have demonstrated strong performance by capturing long-range temporal dependencies, their reliance on self-attention results in quadratic complexity with respect to sequence length, limiting scalability on high-frequency or large-scale graphs. In this work, we revisit the necessity of self-attention in dynamic graph modeling. Inspired by recent findings that attribute the success of Transformers more to their architectural design than attention itself, we propose GLFormer, a novel attention-free Transformer-style framework for dynamic graphs. GLFormer introduces an adaptive token mixer that performs context-aware local aggregation based on interaction order and time intervals. To capture long-term dependencies, we further design a hierarchical aggregation module that expands the temporal receptive field by stacking local token mixers across layers. Experiments on six widely-used dynamic graph benchmarks show that GLFormer achieves SOTA performance, which reveals that attention-free architectures can match or surpass Transformer baselines in dynamic graph settings with significantly improved efficiency.

Global-Lens Transformers: Adaptive Token Mixing for Dynamic Link Prediction

TL;DR

This paper addresses scalable dynamic graph learning for temporal link prediction by questioning the necessity of self-attention. It introduces GLFormer, an attention-free Transformer-style architecture with an adaptive token mixer that uses temporal order and time gaps, and a hierarchical aggregation mechanism to capture long-range dependencies. Empirical results on six benchmarks show GLFormer achieves state-of-the-art performance with substantially lower inference cost than attention-based Transformers, validating the viability of attention-free designs for dynamic graphs. The approach offers a practical, scalable solution for real-time dynamic link prediction in large-scale, high-frequency graphs.

Abstract

Dynamic graph learning plays a pivotal role in modeling evolving relationships over time, especially for temporal link prediction tasks in domains such as traffic systems, social networks, and recommendation platforms. While Transformer-based models have demonstrated strong performance by capturing long-range temporal dependencies, their reliance on self-attention results in quadratic complexity with respect to sequence length, limiting scalability on high-frequency or large-scale graphs. In this work, we revisit the necessity of self-attention in dynamic graph modeling. Inspired by recent findings that attribute the success of Transformers more to their architectural design than attention itself, we propose GLFormer, a novel attention-free Transformer-style framework for dynamic graphs. GLFormer introduces an adaptive token mixer that performs context-aware local aggregation based on interaction order and time intervals. To capture long-term dependencies, we further design a hierarchical aggregation module that expands the temporal receptive field by stacking local token mixers across layers. Experiments on six widely-used dynamic graph benchmarks show that GLFormer achieves SOTA performance, which reveals that attention-free architectures can match or surpass Transformer baselines in dynamic graph settings with significantly improved efficiency.

Paper Structure

This paper contains 20 sections, 9 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Illustration of a dynamic graph $G$ that evolves from $t_0$ to $t_8$ in (a). We aim to predict whether $u_2$ will interact with $u_1$ at timestamp $t_9$. (b) To capture the temporal dependencies among neighbors, existing works use self-attention mechanisms to learn these correlations.
  • Figure 2: We show the average precision results for dynamic link prediction on four datasets with three types of token aggregation mechanisms.
  • Figure 3: Framework of GLFormer.
  • Figure 4: Performance of different numbers of GLFormer layers in transductive dynamic link prediction.
  • Figure 5: Log-scale evaluation time for various methods.
  • ...and 1 more figures