HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers

Maciej Besta; Afonso Claudino Catarino; Lukas Gianinazzi; Nils Blach; Piotr Nyczyk; Hubert Niewiadomski; Torsten Hoefler

HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers

Maciej Besta, Afonso Claudino Catarino, Lukas Gianinazzi, Nils Blach, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler

TL;DR

Dynamic link prediction on rapidly evolving graphs is addressed by HOT, which integrates higher-order ($k$-hop) neighborhood structures into a Transformer-based dynamic graph learning framework. HOT encodes HO interactions into the attention mechanism and employs a Block-Recurrent Transformer to bound memory while producing robust temporal node representations for prediction. Empirically, HOT achieves up to $9\%$, $7\%$, and $15\%$ higher accuracy than DyGFormer, TGN, and GraphMixer on the MOOC dataset, with strong generalization to other dynamic GRL tasks. The approach offers a scalable path to leveraging HO graph information in dynamic settings and can be extended to tasks beyond link prediction such as dynamic node classification and regression.

Abstract

Many graph representation learning (GRL) problems are dynamic, with millions of edges added or removed per second. A fundamental workload in this setting is dynamic link prediction: using a history of graph updates to predict whether a given pair of vertices will become connected. Recent schemes for link prediction in such dynamic settings employ Transformers, modeling individual graph updates as single tokens. In this work, we propose HOT: a model that enhances this line of works by harnessing higher-order (HO) graph structures; specifically, k-hop neighbors and more general subgraphs containing a given pair of vertices. Harnessing such HO structures by encoding them into the attention matrix of the underlying Transformer results in higher accuracy of link prediction outcomes, but at the expense of increased memory pressure. To alleviate this, we resort to a recent class of schemes that impose hierarchy on the attention matrix, significantly reducing memory footprint. The final design offers a sweetspot between high accuracy and low memory utilization. HOT outperforms other dynamic GRL schemes, for example achieving 9%, 7%, and 15% higher accuracy than - respectively - DyGFormer, TGN, and GraphMixer, for the MOOC dataset. Our design can be seamlessly extended towards other dynamic GRL workloads.

HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers

TL;DR

Dynamic link prediction on rapidly evolving graphs is addressed by HOT, which integrates higher-order (

-hop) neighborhood structures into a Transformer-based dynamic graph learning framework. HOT encodes HO interactions into the attention mechanism and employs a Block-Recurrent Transformer to bound memory while producing robust temporal node representations for prediction. Empirically, HOT achieves up to

, and

higher accuracy than DyGFormer, TGN, and GraphMixer on the MOOC dataset, with strong generalization to other dynamic GRL tasks. The approach offers a scalable path to leveraging HO graph information in dynamic settings and can be extended to tasks beyond link prediction such as dynamic node classification and regression.

Abstract

Paper Structure (38 sections, 25 equations, 3 figures, 1 table)

This paper contains 38 sections, 25 equations, 3 figures, 1 table.

Introduction
Background
Graph Model and Representation
The HOT Model
Extracting Higher-Order Neighbors
Constructing Input Feature Matrices
Encoding Higher-Order Neighbor Interactions
Patching, Alignment, Concatenation
Harnessing Temporal Hierarchy with Block-Recurrent Transformer
Computational Cost
Evaluation
Experimental Setup
Analysis of Performance
Analysis of Higher-Order (HO) Characteristics
Analysis of Memory Consumption
...and 23 more sections

Figures (3)

Figure 1: Illustration of a temporal higher-order example and an overview of the HOT model.
Figure 2: AP (%) and AUC (%) scores on the MOOC, LastFM and CanParl datasets using the various negative edge sampling techniques (RNES, HNES, INES) in the transductive setting, and using the random negative edge sampling technique in the inductive setting (Ind). Baseline results are the best ones provided by dygformer.
Figure 3: The analysis of the impact of the block and patch size on memory utilization.

HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers

TL;DR

Abstract

HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers

Authors

TL;DR

Abstract

Table of Contents

Figures (3)