Norm Augmented Graph AutoEncoders for Link Prediction
Yunhui Liu, Huaisong Zhang, Xinyi Gao, Liuye Guo, Zhen Tao, Tieke He
TL;DR
This work identifies a degree-related bias in Graph AutoEncoders for link prediction, showing that higher-degree nodes acquire larger embedding norms which inflate positive scores and suppress negatives. The authors diagnose embedding-norm imbalance as the root cause and propose Norm Augmentation (NA), a simple, plug-and-play strategy that adds $(d_t-d_i)$ self-loops for low-degree nodes to raise their embedding norms and balance performance. NA integrates with various GAE backbones with minimal overhead and yields consistent improvements across multiple datasets and encoders, often outperforming existing degree-fair baselines. The approach directly targets the training objective, offering a practical solution to enhance link prediction fairness and accuracy in long-tailed graphs.
Abstract
Link Prediction (LP) is a crucial problem in graph-structured data. Graph Neural Networks (GNNs) have gained prominence in LP, with Graph AutoEncoders (GAEs) being a notable representation. However, our empirical findings reveal that GAEs' LP performance suffers heavily from the long-tailed node degree distribution, i.e., low-degree nodes tend to exhibit inferior LP performance compared to high-degree nodes. \emph{What causes this degree-related bias, and how can it be mitigated?} In this study, we demonstrate that the norm of node embeddings learned by GAEs exhibits variation among nodes with different degrees, underscoring its central significance in influencing the final performance of LP. Specifically, embeddings with larger norms tend to guide the decoder towards predicting higher scores for positive links and lower scores for negative links, thereby contributing to superior performance. This observation motivates us to improve GAEs' LP performance on low-degree nodes by increasing their embedding norms, which can be implemented simply yet effectively by introducing additional self-loops into the training objective for low-degree nodes. This norm augmentation strategy can be seamlessly integrated into existing GAE methods with light computational cost. Extensive experiments on various datasets and GAE methods show the superior performance of norm-augmented GAEs.
