Table of Contents
Fetching ...

Understanding the Design Principles of Link Prediction in Directed Settings

Jun Zhai, Muberra Ozmen, Thomas Markovich

TL;DR

This work addresses directed link prediction by challenging the common symmetry assumption in graph representation learning. It systematically studies how directionality affects three design components—encoders, decoders, and structural features—and introduces DirLP, a principled framework combining a directed labeling scheme, a direction-aware DirGNN encoder, directed edge features, and an asymmetric decoder. Empirical results across six directed benchmarks show that directed design choices yield substantial gains over baselines, often surpassing state-of-the-art undirected GNNs. The findings highlight the practical importance of explicitly modeling directionality and provide concrete guidance for building scalable, directed link predictors in real-world graphs.

Abstract

Link prediction is a widely studied task in Graph Representation Learning (GRL) for modeling relational data. The early theories in GRL were based on the assumption of a symmetric adjacency matrix, reflecting an undirected setting. As a result, much of the following state-of-the-art research has continued to operate under this symmetry assumption, even though real-world data often involve crucial information conveyed through the direction of relationships. This oversight limits the ability of these models to fully capture the complexity of directed interactions. In this paper, we focus on the challenge of directed link prediction by evaluating key heuristics that have been successful in undirected settings. We propose simple but effective adaptations of these heuristics to the directed link prediction task and demonstrate that these modifications produce competitive performance compared to the leading Graph Neural Networks (GNNs) originally designed for undirected graphs. Through an extensive set of experiments, we derive insights that inform the development of a novel framework for directed link prediction, which not only surpasses baseline methods but also outperforms state-of-the-art GNNs on multiple benchmarks.

Understanding the Design Principles of Link Prediction in Directed Settings

TL;DR

This work addresses directed link prediction by challenging the common symmetry assumption in graph representation learning. It systematically studies how directionality affects three design components—encoders, decoders, and structural features—and introduces DirLP, a principled framework combining a directed labeling scheme, a direction-aware DirGNN encoder, directed edge features, and an asymmetric decoder. Empirical results across six directed benchmarks show that directed design choices yield substantial gains over baselines, often surpassing state-of-the-art undirected GNNs. The findings highlight the practical importance of explicitly modeling directionality and provide concrete guidance for building scalable, directed link predictors in real-world graphs.

Abstract

Link prediction is a widely studied task in Graph Representation Learning (GRL) for modeling relational data. The early theories in GRL were based on the assumption of a symmetric adjacency matrix, reflecting an undirected setting. As a result, much of the following state-of-the-art research has continued to operate under this symmetry assumption, even though real-world data often involve crucial information conveyed through the direction of relationships. This oversight limits the ability of these models to fully capture the complexity of directed interactions. In this paper, we focus on the challenge of directed link prediction by evaluating key heuristics that have been successful in undirected settings. We propose simple but effective adaptations of these heuristics to the directed link prediction task and demonstrate that these modifications produce competitive performance compared to the leading Graph Neural Networks (GNNs) originally designed for undirected graphs. Through an extensive set of experiments, we derive insights that inform the development of a novel framework for directed link prediction, which not only surpasses baseline methods but also outperforms state-of-the-art GNNs on multiple benchmarks.

Paper Structure

This paper contains 14 sections, 1 theorem, 19 equations, 4 figures, 9 tables.

Key Result

Theorem 1

Let $\mathcal{M}_{\operatorname{sGNN}}$ be the family of GNNs defined by Equation eq:mpnn equipped with a symmetric decoder and augmented by undirected structural features. Additionally, let $\mathcal{M}_{\operatorname{DirLP}}$ be family of all models defined by Equation eq:dirlp_decoder. $\mathcal{

Figures (4)

  • Figure 1: An illustrative evaluation demonstrating the impact of incorporating directionality into the design of predictive models. We generate a directed ring graph $\mathcal{G} = (\mathcal{V}, \mathcal{E})$. Each node $u \in \mathcal{V}$ is initialized with a two-dimensional embedding $\mathbf{e}_u^{(0)}$. We then train GraphSage hamilton2017inductive, to update these embeddings using two different decoders: (1) a undirected decoder, and (2) a directed decoder, to perform link prediction. By visualizing the output node embeddings in both cases, we observe that the structural representation of nodes is significantly enhanced when using a directed decoder, emphasizing the importance of directionality in the model design.
  • Figure 2: Overview of DirLP. Given the input graph $\mathcal{G} = (\mathcal{V}, \mathcal{E})$ and node features $\mathbf{x} \in \mathbb{R}^{d_{\operatorname{raw}}}, \forall u \in \mathcal{V}$, DirLP follows a series of steps to predict directed links. First, a set of negative edges $\mathcal{E}^{-}$ is generated. Next, for each edge $(u, v)$ in the set $\mathcal{E} \cup \mathcal{E}^{-}$, structural edge features $\mathbf{z}_{(u,v)} \in \mathbb{R}^{d_{\operatorname{str}}}$ are computed. Then, directional labels are assigned to each node $u \in \mathcal{V}$, and intermediate node features $\mathbf{x}'_u \in \mathbb{R}^{d_{\operatorname{raw}} + d_{\operatorname{label}}}$ are constructed by concatenating the original node features with the directional labels. The model then applies DirGNN message passing to produce node embeddings $\mathbf{e}_u \in \mathbb{R}^{d_{\operatorname{enc}}}$ for $u\in\mathcal{V}$. For each edge $(u, v)$ in $\mathcal{E} \cup \mathcal{E}^{-}$, the edge features are concatenated with the node embeddings of the edge's endpoints. Finally, these concatenated embeddings are passed through an MLP followed by a sigmoid activation function to make predictions.
  • Figure 3: Statistical Comparison Between GraphSage and DirLP. The violin plots illustrate the performance of GraphSage and DirLP in terms of MRR across multiple data splits.
  • Figure 4: A complete graph with four nodes.

Theorems & Definitions (1)

  • Theorem 1