Table of Contents
Fetching ...

Norm Augmented Graph AutoEncoders for Link Prediction

Yunhui Liu, Huaisong Zhang, Xinyi Gao, Liuye Guo, Zhen Tao, Tieke He

TL;DR

This work identifies a degree-related bias in Graph AutoEncoders for link prediction, showing that higher-degree nodes acquire larger embedding norms which inflate positive scores and suppress negatives. The authors diagnose embedding-norm imbalance as the root cause and propose Norm Augmentation (NA), a simple, plug-and-play strategy that adds $(d_t-d_i)$ self-loops for low-degree nodes to raise their embedding norms and balance performance. NA integrates with various GAE backbones with minimal overhead and yields consistent improvements across multiple datasets and encoders, often outperforming existing degree-fair baselines. The approach directly targets the training objective, offering a practical solution to enhance link prediction fairness and accuracy in long-tailed graphs.

Abstract

Link Prediction (LP) is a crucial problem in graph-structured data. Graph Neural Networks (GNNs) have gained prominence in LP, with Graph AutoEncoders (GAEs) being a notable representation. However, our empirical findings reveal that GAEs' LP performance suffers heavily from the long-tailed node degree distribution, i.e., low-degree nodes tend to exhibit inferior LP performance compared to high-degree nodes. \emph{What causes this degree-related bias, and how can it be mitigated?} In this study, we demonstrate that the norm of node embeddings learned by GAEs exhibits variation among nodes with different degrees, underscoring its central significance in influencing the final performance of LP. Specifically, embeddings with larger norms tend to guide the decoder towards predicting higher scores for positive links and lower scores for negative links, thereby contributing to superior performance. This observation motivates us to improve GAEs' LP performance on low-degree nodes by increasing their embedding norms, which can be implemented simply yet effectively by introducing additional self-loops into the training objective for low-degree nodes. This norm augmentation strategy can be seamlessly integrated into existing GAE methods with light computational cost. Extensive experiments on various datasets and GAE methods show the superior performance of norm-augmented GAEs.

Norm Augmented Graph AutoEncoders for Link Prediction

TL;DR

This work identifies a degree-related bias in Graph AutoEncoders for link prediction, showing that higher-degree nodes acquire larger embedding norms which inflate positive scores and suppress negatives. The authors diagnose embedding-norm imbalance as the root cause and propose Norm Augmentation (NA), a simple, plug-and-play strategy that adds self-loops for low-degree nodes to raise their embedding norms and balance performance. NA integrates with various GAE backbones with minimal overhead and yields consistent improvements across multiple datasets and encoders, often outperforming existing degree-fair baselines. The approach directly targets the training objective, offering a practical solution to enhance link prediction fairness and accuracy in long-tailed graphs.

Abstract

Link Prediction (LP) is a crucial problem in graph-structured data. Graph Neural Networks (GNNs) have gained prominence in LP, with Graph AutoEncoders (GAEs) being a notable representation. However, our empirical findings reveal that GAEs' LP performance suffers heavily from the long-tailed node degree distribution, i.e., low-degree nodes tend to exhibit inferior LP performance compared to high-degree nodes. \emph{What causes this degree-related bias, and how can it be mitigated?} In this study, we demonstrate that the norm of node embeddings learned by GAEs exhibits variation among nodes with different degrees, underscoring its central significance in influencing the final performance of LP. Specifically, embeddings with larger norms tend to guide the decoder towards predicting higher scores for positive links and lower scores for negative links, thereby contributing to superior performance. This observation motivates us to improve GAEs' LP performance on low-degree nodes by increasing their embedding norms, which can be implemented simply yet effectively by introducing additional self-loops into the training objective for low-degree nodes. This norm augmentation strategy can be seamlessly integrated into existing GAE methods with light computational cost. Extensive experiments on various datasets and GAE methods show the superior performance of norm-augmented GAEs.

Paper Structure

This paper contains 18 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Degree distribution of three datasets.
  • Figure 2: GAE's LP performance distribution w.r.t. node degree.
  • Figure 3: Embedding norm distribution w.r.t. node degree.
  • Figure 4: Mean link probability of positive links in the test set w.r.t. node degree.
  • Figure 5: LP performance of different norm-augmented GAEs on Cora (Top) and CiteSeer (Bottom) w.r.t. node degree.
  • ...and 3 more figures