Table of Contents
Fetching ...

From random-walks to graph-sprints: a low-latency node embedding framework on continuous-time dynamic graphs

Ahmad Naser Eddin, Jacopo Bono, David Aparício, Hugo Ferreira, João Ascensão, Pedro Ribeiro, Pedro Bizarro

TL;DR

This work proposes graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models, and demonstrates that it achieves competitive performance.

Abstract

Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling k-hop neighborhoods, akin to breadth-first search, or random walks, akin to depth-first search. However, these methods are computationally expensive and unsuitable for real-time, low-latency inference on dynamic graphs. To overcome these limitations, we propose graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models. To achieve this, a streaming, low latency approximation to the random-walk based features is proposed. In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges. We evaluate our proposed approach on three open-source datasets and two in-house datasets, and compare with three state-of-the-art algorithms (TGN-attn, TGN-ID, Jodie). We demonstrate that our graph-sprints features, combined with a machine learning classifier, achieve competitive performance (outperforming all baselines for the node classification tasks in five datasets). Simultaneously, graph-sprints significantly reduce inference latencies, achieving close to an order of magnitude speed-up in our experimental setting.

From random-walks to graph-sprints: a low-latency node embedding framework on continuous-time dynamic graphs

TL;DR

This work proposes graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models, and demonstrates that it achieves competitive performance.

Abstract

Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling k-hop neighborhoods, akin to breadth-first search, or random walks, akin to depth-first search. However, these methods are computationally expensive and unsuitable for real-time, low-latency inference on dynamic graphs. To overcome these limitations, we propose graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models. To achieve this, a streaming, low latency approximation to the random-walk based features is proposed. In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges. We evaluate our proposed approach on three open-source datasets and two in-house datasets, and compare with three state-of-the-art algorithms (TGN-attn, TGN-ID, Jodie). We demonstrate that our graph-sprints features, combined with a machine learning classifier, achieve competitive performance (outperforming all baselines for the node classification tasks in five datasets). Simultaneously, graph-sprints significantly reduce inference latencies, achieving close to an order of magnitude speed-up in our experimental setting.
Paper Structure (25 sections, 9 equations, 3 figures, 8 tables, 1 algorithm)

This paper contains 25 sections, 9 equations, 3 figures, 8 tables, 1 algorithm.

Figures (3)

  • Figure 2: AUC vs. Runtime trade-off. Our proposed methods based on graph-sprints (GS or GS-Raw), allow for low latency inference while outperforming state-of-the-art methods in terms of AUC on node classification tasks. X-axis represents time in seconds to process 200 batches of size 200, Y-axis represents test AUC. Error bars denote the standard deviation over 10 random seeds.
  • Figure 3: From random-walks to graph-sprints. Edges have a timestamp feature (numbers) representing the time that a relationship was created. (A) A temporal random-walk is traversed from the most recent interaction A-B towards older interactions. (B) The same random-walk can be seen as a time-series of edges. (C) Based on a full temporal random-walk, one can compute embeddings by aggregating encountered feature values (section \ref{['sec:random-walk-based-features']}). (D) One can compute similar embeddings in a streaming setting, from only the new edge and the existing embeddings of the involved nodes.
  • Figure 4: Speedup vs. number of edges. The speedups increase almost linearly with the number of edges in the graph.