Table of Contents
Fetching ...

Leveraging Temporal Graph Networks Using Module Decoupling

Or Feldman, Chaim Baskin

TL;DR

The paper addresses the bottleneck of missing updates in memory-based dynamic-graph learning when operating under streaming conditions that rely on batching. It introduces a decoupling strategy that separates memory and prediction modules, enabling frequent memory updates with small memory batches while using larger batch sizes for prediction by leveraging a saved neighborhood view. Building on EdgeBank, the Lightweight Decoupled Temporal Graph Network (LDTGN) and its LDTGN-mem variant deliver high throughput and competitive or state-of-the-art accuracy on transductive and inductive future-edge prediction benchmarks. The approach significantly improves throughput with a minimal parameter footprint and demonstrates robustness across diverse dynamic-graph datasets, positioning it as a practical solution for real-time dynamic graph tasks.

Abstract

Modern approaches for learning on dynamic graphs have adopted the use of batches instead of applying updates one by one. The use of batches allows these techniques to become helpful in streaming scenarios where updates to graphs are received at extreme speeds. Using batches, however, forces the models to update infrequently, which results in the degradation of their performance. In this work, we suggest a decoupling strategy that enables the models to update frequently while using batches. By decoupling the core modules of temporal graph networks and implementing them using a minimal number of learnable parameters, we have developed the Lightweight Decoupled Temporal Graph Network (LDTGN), an exceptionally efficient model for learning on dynamic graphs. LDTG was validated on various dynamic graph benchmarks, providing comparable or state-of-the-art results with significantly higher throughput than previous art. Notably, our method outperforms previous approaches by more than 20\% on benchmarks that require rapid model update rates, such as USLegis or UNTrade. The code to reproduce our experiments is available at \href{https://orfeld415.github.io/module-decoupling}{this http url}.

Leveraging Temporal Graph Networks Using Module Decoupling

TL;DR

The paper addresses the bottleneck of missing updates in memory-based dynamic-graph learning when operating under streaming conditions that rely on batching. It introduces a decoupling strategy that separates memory and prediction modules, enabling frequent memory updates with small memory batches while using larger batch sizes for prediction by leveraging a saved neighborhood view. Building on EdgeBank, the Lightweight Decoupled Temporal Graph Network (LDTGN) and its LDTGN-mem variant deliver high throughput and competitive or state-of-the-art accuracy on transductive and inductive future-edge prediction benchmarks. The approach significantly improves throughput with a minimal parameter footprint and demonstrates robustness across diverse dynamic-graph datasets, positioning it as a practical solution for real-time dynamic graph tasks.

Abstract

Modern approaches for learning on dynamic graphs have adopted the use of batches instead of applying updates one by one. The use of batches allows these techniques to become helpful in streaming scenarios where updates to graphs are received at extreme speeds. Using batches, however, forces the models to update infrequently, which results in the degradation of their performance. In this work, we suggest a decoupling strategy that enables the models to update frequently while using batches. By decoupling the core modules of temporal graph networks and implementing them using a minimal number of learnable parameters, we have developed the Lightweight Decoupled Temporal Graph Network (LDTGN), an exceptionally efficient model for learning on dynamic graphs. LDTG was validated on various dynamic graph benchmarks, providing comparable or state-of-the-art results with significantly higher throughput than previous art. Notably, our method outperforms previous approaches by more than 20\% on benchmarks that require rapid model update rates, such as USLegis or UNTrade. The code to reproduce our experiments is available at \href{https://orfeld415.github.io/module-decoupling}{this http url}.
Paper Structure (25 sections, 20 equations, 6 figures, 8 tables)

This paper contains 25 sections, 20 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: The incidence of missing updates in real-world datasets as a function of the batch size and their impact on the performance of TGN. In \ref{['fig:a']}, the ratio of inputs that depend on at least a single missing update increases significantly as the batch size increases. In \ref{['fig:b']}, the average number of missing updates per input increases as the batch size increases. In \ref{['fig:c']}, the performance of TGN corresponds to the extent of missing updates, where a high incidence of missing updates indicates a significant performance decrease.
  • Figure 2: Illustration of a dynamic graph at $t_4$ for the task of predicting the edge $(v_3,v_6)$. The state of a memory-based model is compared to the state of a model operating using the proposed decoupling strategy. The memory-based model was updated prior to $t_{1}$ and, therefore, does not contain $(v_1,v_2)$,$(v_2,v_3)$ and $(v_4,v_5)$. The model that follows the decoupling strategy and applies inner batch updates was previously updated at $t_{2}$ and, therefore, closely resembles the ground truth and is missing only $(v_4,v_5)$.
  • Figure 3: Comparison of running times for decoupled TGN with a constant memory batch size of 50 and varying batch sizes on the test set of the Wikipedia dataset. The running times are normalized by the baseline scenario where both the memory batch size and the batch size are set to 50.
  • Figure 4: Framework of the proposed model. The batch of updates and inputs is first divided into memory batches and a single batch of inputs. Then, the new edges and their appropriate timestamps are saved in the memory. In LDTGN-mem, the state of each node in the memory batch is updated using the $\mathrm{msg}$, $\mathrm{agg}$, and $\mathrm{mem}$ functions. Before each update, the relevant information is saved in a memory view to prevent it from being overridden. Next, the information of each input node is extracted from the appropriate memory view. Then, $\mathrm{TDE}$ is applied to the time differences between the inputs and the time of the extracted timestamps. Neighborhood information is aggregated using learnable attention weights to create a single encoding for each node. Finally, the nodes encoding and the edge encoding are merged using the $\mathrm{merge}$ function, and the combined encoding is used to get the final prediction.
  • Figure 5: Average number of learnable parameters used by the baselines and our model. The black ranges indicate the standard deviation of the average number of learnable parameters.
  • ...and 1 more figures