Table of Contents
Fetching ...

Optimizing Long-tailed Link Prediction in Graph Neural Networks through Structure Representation Enhancement

Yakun Wang, Daixin Wang, Hongrui Liu, Binbin Hu, Yingcui Yan, Qiyang Zhang, Zhiqiang Zhang

TL;DR

This work reveals that link prediction with Graph Neural Networks is more tightly tied to the number of common neighbors than node degree, and that tail node pairs suffer due to a CN long tail. It introduces LTLP, a plug-in framework with Structure Enhancement to add high-quality edges and Representation Enhancement to align tail and head representations, improving tail performance while remaining backbone-agnostic. Extensive experiments across multiple datasets and backbones show LTLP delivers superior overall performance and substantially narrows head-tail gaps, validating the importance of both structure enrichment and representation regularization. The approach offers a practical pathway to bolster link prediction in imbalanced graphs and invites further exploration of CN-based long-tailed effects in more complex structures.

Abstract

Link prediction, as a fundamental task for graph neural networks (GNNs), has boasted significant progress in varied domains. Its success is typically influenced by the expressive power of node representation, but recent developments reveal the inferior performance of low-degree nodes owing to their sparse neighbor connections, known as the degree-based long-tailed problem. Will the degree-based long-tailed distribution similarly constrain the efficacy of GNNs on link prediction? Unexpectedly, our study reveals that only a mild correlation exists between node degree and predictive accuracy, and more importantly, the number of common neighbors between node pairs exhibits a strong correlation with accuracy. Considering node pairs with less common neighbors, i.e., tail node pairs, make up a substantial fraction of the dataset but achieve worse performance, we propose that link prediction also faces the long-tailed problem. Therefore, link prediction of GNNs is greatly hindered by the tail node pairs. After knowing the weakness of link prediction, a natural question is how can we eliminate the negative effects of the skewed long-tailed distribution on common neighbors so as to improve the performance of link prediction? Towards this end, we introduce our long-tailed framework (LTLP), which is designed to enhance the performance of tail node pairs on link prediction by increasing common neighbors. Two key modules in LTLP respectively supplement high-quality edges for tail node pairs and enforce representational alignment between head and tail node pairs within the same category, thereby improving the performance of tail node pairs.

Optimizing Long-tailed Link Prediction in Graph Neural Networks through Structure Representation Enhancement

TL;DR

This work reveals that link prediction with Graph Neural Networks is more tightly tied to the number of common neighbors than node degree, and that tail node pairs suffer due to a CN long tail. It introduces LTLP, a plug-in framework with Structure Enhancement to add high-quality edges and Representation Enhancement to align tail and head representations, improving tail performance while remaining backbone-agnostic. Extensive experiments across multiple datasets and backbones show LTLP delivers superior overall performance and substantially narrows head-tail gaps, validating the importance of both structure enrichment and representation regularization. The approach offers a practical pathway to bolster link prediction in imbalanced graphs and invites further exploration of CN-based long-tailed effects in more complex structures.

Abstract

Link prediction, as a fundamental task for graph neural networks (GNNs), has boasted significant progress in varied domains. Its success is typically influenced by the expressive power of node representation, but recent developments reveal the inferior performance of low-degree nodes owing to their sparse neighbor connections, known as the degree-based long-tailed problem. Will the degree-based long-tailed distribution similarly constrain the efficacy of GNNs on link prediction? Unexpectedly, our study reveals that only a mild correlation exists between node degree and predictive accuracy, and more importantly, the number of common neighbors between node pairs exhibits a strong correlation with accuracy. Considering node pairs with less common neighbors, i.e., tail node pairs, make up a substantial fraction of the dataset but achieve worse performance, we propose that link prediction also faces the long-tailed problem. Therefore, link prediction of GNNs is greatly hindered by the tail node pairs. After knowing the weakness of link prediction, a natural question is how can we eliminate the negative effects of the skewed long-tailed distribution on common neighbors so as to improve the performance of link prediction? Towards this end, we introduce our long-tailed framework (LTLP), which is designed to enhance the performance of tail node pairs on link prediction by increasing common neighbors. Two key modules in LTLP respectively supplement high-quality edges for tail node pairs and enforce representational alignment between head and tail node pairs within the same category, thereby improving the performance of tail node pairs.
Paper Structure (34 sections, 8 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 34 sections, 8 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: The correlation between the link prediction accuracy and different measures, i.e. degrees and common neighbors (CNs), by using GCN and SEAL.
  • Figure 2: Overview of our proposed LTLP framework.
  • Figure 3: Analysis of positive and negative samples on Cora dataset. $S$ denote the difficult level, $(\mathbb{V}_{u,v|v \in \mathcal{N}_{u}})_{md}$ means the median normalized variance of each epoch on set ${(u,v)|v\in\mathcal{N}_u)}$, $R_{ler}$ is the label error ratio.
  • Figure 4: Ablation studies within LTLP and SEM module.
  • Figure 5: Hyper-parameters analysis on Cora dataset.
  • ...and 3 more figures