Table of Contents
Fetching ...

Link Prediction on Textual Edge Graphs

Chen Ling, Zhuofeng Li, Yuntong Hu, Zheng Zhang, Zhongyuan Liu, Shuang Zheng, Jian Pei, Liang Zhao

TL;DR

This paper proposes to summarize neighborhood information between node pairs as a human-written document to preserve both semantic and topology information, and uses a self-supervised learning model to enhance GNN's text-understanding ability from language models.

Abstract

Textual-edge Graphs (TEGs), characterized by rich text annotations on edges, are increasingly significant in network science due to their ability to capture rich contextual information among entities. Existing works have proposed various edge-aware graph neural networks (GNNs) or let language models directly make predictions. However, they often fall short of fully capturing the contextualized semantics on edges and graph topology, respectively. This inadequacy is particularly evident in link prediction tasks that require a comprehensive understanding of graph topology and semantics between nodes. In this paper, we present a novel framework - Link2Doc, designed especially for link prediction on textual-edge graphs. Specifically, we propose to summarize neighborhood information between node pairs as a human-written document to preserve both semantic and topology information. A self-supervised learning model is then utilized to enhance GNN's text-understanding ability from language models. Empirical evaluations, including link prediction, edge classification, parameter analysis, runtime comparison, and ablation studies, on four real-world datasets demonstrate that Link2Doc achieves generally better performance against existing edge-aware GNNs and pre-trained language models in predicting links on TEGs.

Link Prediction on Textual Edge Graphs

TL;DR

This paper proposes to summarize neighborhood information between node pairs as a human-written document to preserve both semantic and topology information, and uses a self-supervised learning model to enhance GNN's text-understanding ability from language models.

Abstract

Textual-edge Graphs (TEGs), characterized by rich text annotations on edges, are increasingly significant in network science due to their ability to capture rich contextual information among entities. Existing works have proposed various edge-aware graph neural networks (GNNs) or let language models directly make predictions. However, they often fall short of fully capturing the contextualized semantics on edges and graph topology, respectively. This inadequacy is particularly evident in link prediction tasks that require a comprehensive understanding of graph topology and semantics between nodes. In this paper, we present a novel framework - Link2Doc, designed especially for link prediction on textual-edge graphs. Specifically, we propose to summarize neighborhood information between node pairs as a human-written document to preserve both semantic and topology information. A self-supervised learning model is then utilized to enhance GNN's text-understanding ability from language models. Empirical evaluations, including link prediction, edge classification, parameter analysis, runtime comparison, and ablation studies, on four real-world datasets demonstrate that Link2Doc achieves generally better performance against existing edge-aware GNNs and pre-trained language models in predicting links on TEGs.
Paper Structure (17 sections, 4 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 4 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: An example of textual-edge graphs: two books are connected by citation links. Predicting whether there'll be a citation between $A$ and $E$ needs to jointly consider both topology and semantic information embedded on nodes and their edges.
  • Figure 2: Overall framework of LLM-enhanced link prediction on Textual-edge graphs, where orange and blue nodes in $G_{(s,t)}$ belong to $s$'s and $t$'s local neighborhood (namely $G_s$ and $G_t$), respectively. Half blue and half orange nodes denote shared nodes between $G_s$ and $G_t$.
  • Figure 3: Given the transition graph $G_{(s,t)}$, we first split $G_{(s,t)}$ into $G_s$ (nodes are marked with orange) corresponding to the local structure of $s$ ($G_t$ is omitted due to space limit). Commonly shared nodes are marked with half blue and half orange. We transform the local structure of $G_s$ into a paragraph that summarizes hierarchical relation with $s$ being the root. For better visibility, hidden edges are highlighted with orange, and commonly shared nodes are highlighted with blue.
  • Figure 4: Leveraging composed documents to enhance base GNNs on Amazon-APPs dataset.
  • Figure 5: The performance on Amazon-APPs.

Theorems & Definitions (2)

  • Definition 1: Textual-edge Graphs
  • Definition 2: Transition Graph