Global-Lens Transformers: Adaptive Token Mixing for Dynamic Link Prediction

Tao Zou; Chengfeng Wu; Tianxi Liao; Junchen Ye; Bowen Du

Global-Lens Transformers: Adaptive Token Mixing for Dynamic Link Prediction

Tao Zou, Chengfeng Wu, Tianxi Liao, Junchen Ye, Bowen Du

TL;DR

This paper addresses scalable dynamic graph learning for temporal link prediction by questioning the necessity of self-attention. It introduces GLFormer, an attention-free Transformer-style architecture with an adaptive token mixer that uses temporal order and time gaps, and a hierarchical aggregation mechanism to capture long-range dependencies. Empirical results on six benchmarks show GLFormer achieves state-of-the-art performance with substantially lower inference cost than attention-based Transformers, validating the viability of attention-free designs for dynamic graphs. The approach offers a practical, scalable solution for real-time dynamic link prediction in large-scale, high-frequency graphs.

Abstract

Dynamic graph learning plays a pivotal role in modeling evolving relationships over time, especially for temporal link prediction tasks in domains such as traffic systems, social networks, and recommendation platforms. While Transformer-based models have demonstrated strong performance by capturing long-range temporal dependencies, their reliance on self-attention results in quadratic complexity with respect to sequence length, limiting scalability on high-frequency or large-scale graphs. In this work, we revisit the necessity of self-attention in dynamic graph modeling. Inspired by recent findings that attribute the success of Transformers more to their architectural design than attention itself, we propose GLFormer, a novel attention-free Transformer-style framework for dynamic graphs. GLFormer introduces an adaptive token mixer that performs context-aware local aggregation based on interaction order and time intervals. To capture long-term dependencies, we further design a hierarchical aggregation module that expands the temporal receptive field by stacking local token mixers across layers. Experiments on six widely-used dynamic graph benchmarks show that GLFormer achieves SOTA performance, which reveals that attention-free architectures can match or surpass Transformer baselines in dynamic graph settings with significantly improved efficiency.

Global-Lens Transformers: Adaptive Token Mixing for Dynamic Link Prediction

TL;DR

Abstract

Global-Lens Transformers: Adaptive Token Mixing for Dynamic Link Prediction

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)