Table of Contents
Fetching ...

Simple yet Effective Node Property Prediction on Edge Streams under Distribution Shifts

Jongha Lee, Taehyung Kwon, Heechan Moon, Kijung Shin

TL;DR

This work tackles node property prediction on continuously evolving graphs (CTDGs) where distribution shifts undermine traditional TGNNs. It introduces SPLASH, a simple yet effective framework that augments node features with three schemes (random, positional, structural), automatically selects the most robust augmentations via linear models evaluated across multiple shift-aware splits, and uses SLIM, a lightweight MLP-based TGNN, to predict properties in real time. Across dynamic anomaly detection, dynamic node classification, and node affinity prediction on seven real-world datasets (plus synthetic shifts), SPLASH consistently outperforms baselines, especially under distribution shifts, while offering substantial gains in speed and parameter efficiency. The approach thus provides a practical, generalizable solution for robust dynamic graph learning, with code and datasets released for reproducibility.

Abstract

The problem of predicting node properties (e.g., node classes) in graphs has received significant attention due to its broad range of applications. Graphs from real-world datasets often evolve over time, with newly emerging edges and dynamically changing node properties, posing a significant challenge for this problem. In response, temporal graph neural networks (TGNNs) have been developed to predict dynamic node properties from a stream of emerging edges. However, our analysis reveals that most TGNN-based methods are (a) far less effective without proper node features and, due to their complex model architectures, (b) vulnerable to distribution shifts. In this paper, we propose SPLASH, a simple yet powerful method for predicting node properties on edge streams under distribution shifts. Our key contributions are as follows: (1) we propose feature augmentation methods and an automatic feature selection method for edge streams, which improve the effectiveness of TGNNs, (2) we propose a lightweight MLP-based TGNN architecture that is highly efficient and robust under distribution shifts, and (3) we conduct extensive experiments to evaluate the accuracy, efficiency, generalization, and qualitative performance of the proposed method and its competitors on dynamic node classification, dynamic anomaly detection, and node affinity prediction tasks across seven real-world datasets.

Simple yet Effective Node Property Prediction on Edge Streams under Distribution Shifts

TL;DR

This work tackles node property prediction on continuously evolving graphs (CTDGs) where distribution shifts undermine traditional TGNNs. It introduces SPLASH, a simple yet effective framework that augments node features with three schemes (random, positional, structural), automatically selects the most robust augmentations via linear models evaluated across multiple shift-aware splits, and uses SLIM, a lightweight MLP-based TGNN, to predict properties in real time. Across dynamic anomaly detection, dynamic node classification, and node affinity prediction on seven real-world datasets (plus synthetic shifts), SPLASH consistently outperforms baselines, especially under distribution shifts, while offering substantial gains in speed and parameter efficiency. The approach thus provides a practical, generalizable solution for robust dynamic graph learning, with code and datasets released for reproducibility.

Abstract

The problem of predicting node properties (e.g., node classes) in graphs has received significant attention due to its broad range of applications. Graphs from real-world datasets often evolve over time, with newly emerging edges and dynamically changing node properties, posing a significant challenge for this problem. In response, temporal graph neural networks (TGNNs) have been developed to predict dynamic node properties from a stream of emerging edges. However, our analysis reveals that most TGNN-based methods are (a) far less effective without proper node features and, due to their complex model architectures, (b) vulnerable to distribution shifts. In this paper, we propose SPLASH, a simple yet powerful method for predicting node properties on edge streams under distribution shifts. Our key contributions are as follows: (1) we propose feature augmentation methods and an automatic feature selection method for edge streams, which improve the effectiveness of TGNNs, (2) we propose a lightweight MLP-based TGNN architecture that is highly efficient and robust under distribution shifts, and (3) we conduct extensive experiments to evaluate the accuracy, efficiency, generalization, and qualitative performance of the proposed method and its competitors on dynamic node classification, dynamic anomaly detection, and node affinity prediction tasks across seven real-world datasets.

Paper Structure

This paper contains 28 sections, 20 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: An example of distribution shifts in a collaboration network from a company with two departments. (a) shows the network before the distribution shift at time $t^{(6)}$, and (b) shows the network after the shift. Note that node U5's community membership shifts from Department A to B over time.
  • Figure 2: An example graph with (a) positional node features and (b) structural node features. The blue arrows indicate node pairs with similar node features.
  • Figure 3: Examples of distribution shifts in edge streams: (a) positional, (b) structural, and (c) property distribution shifts over time in the Reddit dataset. In (a), nodes are grouped based on their appearance time, and the node embeddings generated by node2vec grover2016node2vec using the entire graph are averaged within each group. These averaged embeddings are visualized using t-SNE.
  • Figure 4: An example of node property prediction in a CTDG over time. This process involves a memory that stores a summary or sample of the CTDG. Whenever a temporal edge arrives, the memory is updated, and if a label query is received, a model (e.g., TGNN) makes a prediction based on the memory updated until that time point.
  • Figure 5: An outline of SPLASH. In the training phase, for a given training CTDG, SPLASH (1) generates augmented node features through feature augmentation, (2) identifies task-relevant features using feature selection, and (3) trains our proposed SLIM model with the selected augmented features. In the test phase, for a given test CTDG, SPLASH (1) generates the selected augmented features for nodes unseen during training through feature propagation and (2) predicts node properties using the trained SLIM model.
  • ...and 9 more figures

Theorems & Definitions (10)

  • Example 1
  • Example 2
  • Example 3
  • Example 4
  • Example 5
  • Example 6
  • Example 7
  • Example 8
  • Example 9
  • Example 10