Table of Contents
Fetching ...

Rethinking Link Prediction for Directed Graphs

Mingguo He, Yuhe Guo, Yanping Zheng, Zhewei Wei, Stephan Günnemann, Xiaokui Xiao

TL;DR

Rethinking Link Prediction for Directed Graphs formalizes directed link prediction within a unified encoder–decoder framework and demonstrates that dual embeddings ${\mathbf{s}}_u$, ${\mathbf{t}}_u$ more effectively capture directionality than single embeddings. It introduces DirLinkBench to standardize evaluation across seven real-world datasets and multiple metrics, revealing that simple models like DiGAE can outperform more complex approaches when decoders and losses are chosen appropriately. Building on these insights, the paper reinterprets DiGAE as a GCN on an undirected bipartite graph and presents SDGAE, a spectral directed graph auto-encoder that learns arbitrary polynomial filters with complexity $O(2K m d)$, achieving state-of-the-art average performance on DirLinkBench. The work also analyzes the impact of feature inputs, loss functions, decoder design, degree distributions, and negative sampling, and outlines open challenges, including developing more expressive decoders for complex-valued methods and better preserving asymmetry in directed graphs.

Abstract

Link prediction for directed graphs is a crucial task with diverse real-world applications. Recent advances in embedding methods and Graph Neural Networks (GNNs) have shown promising improvements. However, these methods often lack a thorough analysis of their expressiveness and suffer from effective benchmarks for a fair evaluation. In this paper, we propose a unified framework to assess the expressiveness of existing methods, highlighting the impact of dual embeddings and decoder design on directed link prediction performance. To address limitations in current benchmark setups, we introduce DirLinkBench, a robust new benchmark with comprehensive coverage, standardized evaluation, and modular extensibility. The results on DirLinkBench show that current methods struggle to achieve strong performance, while DiGAE outperforms other baselines overall. We further revisit DiGAE theoretically, showing its graph convolution aligns with GCN on an undirected bipartite graph. Inspired by these insights, we propose a novel Spectral Directed Graph Auto-Encoder SDGAE that achieves state-of-the-art average performance on DirLinkBench. Finally, we analyze key factors influencing directed link prediction and highlight open challenges in this field.

Rethinking Link Prediction for Directed Graphs

TL;DR

Rethinking Link Prediction for Directed Graphs formalizes directed link prediction within a unified encoder–decoder framework and demonstrates that dual embeddings , more effectively capture directionality than single embeddings. It introduces DirLinkBench to standardize evaluation across seven real-world datasets and multiple metrics, revealing that simple models like DiGAE can outperform more complex approaches when decoders and losses are chosen appropriately. Building on these insights, the paper reinterprets DiGAE as a GCN on an undirected bipartite graph and presents SDGAE, a spectral directed graph auto-encoder that learns arbitrary polynomial filters with complexity , achieving state-of-the-art average performance on DirLinkBench. The work also analyzes the impact of feature inputs, loss functions, decoder design, degree distributions, and negative sampling, and outlines open challenges, including developing more expressive decoders for complex-valued methods and better preserving asymmetry in directed graphs.

Abstract

Link prediction for directed graphs is a crucial task with diverse real-world applications. Recent advances in embedding methods and Graph Neural Networks (GNNs) have shown promising improvements. However, these methods often lack a thorough analysis of their expressiveness and suffer from effective benchmarks for a fair evaluation. In this paper, we propose a unified framework to assess the expressiveness of existing methods, highlighting the impact of dual embeddings and decoder design on directed link prediction performance. To address limitations in current benchmark setups, we introduce DirLinkBench, a robust new benchmark with comprehensive coverage, standardized evaluation, and modular extensibility. The results on DirLinkBench show that current methods struggle to achieve strong performance, while DiGAE outperforms other baselines overall. We further revisit DiGAE theoretically, showing its graph convolution aligns with GCN on an undirected bipartite graph. Inspired by these insights, we propose a novel Spectral Directed Graph Auto-Encoder SDGAE that achieves state-of-the-art average performance on DirLinkBench. Finally, we analyze key factors influencing directed link prediction and highlight open challenges in this field.

Paper Structure

This paper contains 31 sections, 3 theorems, 18 equations, 11 figures, 12 tables.

Key Result

Proposition 3.2

Single methods (single real-valued embedding ${\mathbf{h}}_u$) with an asymmetric decoder function $\mathrm{MLP}({\mathbf{h}}_u \| {\mathbf{h}}_v)$ can capture graph structure and enable reconstruction for some specific directed graphs, but not arbitrary directed graphs, such as directed ring graphs

Figures (11)

  • Figure 1: The results of MagNet magnet as reported in the original paper, alongside the reproduced MagNet and MLP results.
  • Figure 2: The number of samples and the accuracy for each class of DUPLEX duplex on the Cora and CiteSeer dataset in the 4C task.
  • Figure 3: The bipartite graph representation of two toy directed graphs.
  • Figure 4: Performance comparison of SDGAE, DiGAE, and DiGAE with residual connections (i.e., DiGAE$^*$) on the Cora-ML and CiteSeer datasets, with varying numbers of convolutional layers or polynomial orders.
  • Figure 5: Polynomial coefficients learned by SDGAE on the Cora-ML and CiteSeer datasets with $K = 5$.
  • ...and 6 more figures

Theorems & Definitions (7)

  • Definition 3.1
  • Proposition 3.2
  • Lemma 5.1
  • Lemma 5.2
  • proof
  • proof
  • proof