Table of Contents
Fetching ...

Todyformer: Towards Holistic Dynamic Graph Transformers with Structure-Aware Tokenization

Mahdi Biparva, Raika Karimi, Faezeh Faez, Yingxue Zhang

TL;DR

The paper addresses the persistent issues of over-smoothing and over-squashing in dynamic-graph learning by introducing Todyformer, a tokenized dynamic graph transformer. It fuses local, structure-aware MPNN-based tokenization with global Transformer modeling through a patch-based, temporally aware encoding scheme that alternates between local and global processing. Key contributions include patch generation to bound neighborhood growth, learnable structure-aware tokenization via a DyG2Vec-inspired GNN, a temporal Transformer with dedicated masking, and an alternating encoder design that mitigates common GNN limitations. Empirical results on FLP, DNC, and large-scale TGBL benchmarks demonstrate state-of-the-art performance and favorable efficiency, supporting the practical viability of multi-scale dynamic graph transformers.

Abstract

Temporal Graph Neural Networks have garnered substantial attention for their capacity to model evolving structural and temporal patterns while exhibiting impressive performance. However, it is known that these architectures are encumbered by issues that constrain their performance, such as over-squashing and over-smoothing. Meanwhile, Transformers have demonstrated exceptional computational capacity to effectively address challenges related to long-range dependencies. Consequently, we introduce Todyformer-a novel Transformer-based neural network tailored for dynamic graphs. It unifies the local encoding capacity of Message-Passing Neural Networks (MPNNs) with the global encoding of Transformers through i) a novel patchifying paradigm for dynamic graphs to improve over-squashing, ii) a structure-aware parametric tokenization strategy leveraging MPNNs, iii) a Transformer with temporal positional-encoding to capture long-range dependencies, and iv) an encoding architecture that alternates between local and global contextualization, mitigating over-smoothing in MPNNs. Experimental evaluations on public benchmark datasets demonstrate that Todyformer consistently outperforms the state-of-the-art methods for downstream tasks. Furthermore, we illustrate the underlying aspects of the proposed model in effectively capturing extensive temporal dependencies in dynamic graphs.

Todyformer: Towards Holistic Dynamic Graph Transformers with Structure-Aware Tokenization

TL;DR

The paper addresses the persistent issues of over-smoothing and over-squashing in dynamic-graph learning by introducing Todyformer, a tokenized dynamic graph transformer. It fuses local, structure-aware MPNN-based tokenization with global Transformer modeling through a patch-based, temporally aware encoding scheme that alternates between local and global processing. Key contributions include patch generation to bound neighborhood growth, learnable structure-aware tokenization via a DyG2Vec-inspired GNN, a temporal Transformer with dedicated masking, and an alternating encoder design that mitigates common GNN limitations. Empirical results on FLP, DNC, and large-scale TGBL benchmarks demonstrate state-of-the-art performance and favorable efficiency, supporting the practical viability of multi-scale dynamic graph transformers.

Abstract

Temporal Graph Neural Networks have garnered substantial attention for their capacity to model evolving structural and temporal patterns while exhibiting impressive performance. However, it is known that these architectures are encumbered by issues that constrain their performance, such as over-squashing and over-smoothing. Meanwhile, Transformers have demonstrated exceptional computational capacity to effectively address challenges related to long-range dependencies. Consequently, we introduce Todyformer-a novel Transformer-based neural network tailored for dynamic graphs. It unifies the local encoding capacity of Message-Passing Neural Networks (MPNNs) with the global encoding of Transformers through i) a novel patchifying paradigm for dynamic graphs to improve over-squashing, ii) a structure-aware parametric tokenization strategy leveraging MPNNs, iii) a Transformer with temporal positional-encoding to capture long-range dependencies, and iv) an encoding architecture that alternates between local and global contextualization, mitigating over-smoothing in MPNNs. Experimental evaluations on public benchmark datasets demonstrate that Todyformer consistently outperforms the state-of-the-art methods for downstream tasks. Furthermore, we illustrate the underlying aspects of the proposed model in effectively capturing extensive temporal dependencies in dynamic graphs.
Paper Structure (31 sections, 10 equations, 5 figures, 14 tables)

This paper contains 31 sections, 10 equations, 5 figures, 14 tables.

Figures (5)

  • Figure 1: Illustration of Todyformer encoding-decoding architecture.
  • Figure 2: Schematic depiction of the computation flow in the local and global encoding modules, particularly highlighting node packing and unpacking modules in Todyformer.
  • Figure 3: Sensitivity analysis on the number of patches and input window size values on MOOC and LastFM. The plot on the left has a fixed input window size of 262144, while the one on the right has 32 patches.
  • Figure 4: The performance versus inference time across LastFM, SocialEvol, and MOOC datasets.
  • Figure 5: Sensitivity analysis on the number of layers and blocks conducted on the MOOC dataset.