CORE: Data Augmentation for Link Prediction via Information Bottleneck
Kaiwen Dong, Zhichun Guo, Nitesh V. Chawla
TL;DR
CORE tackles the challenge of noisy and incomplete graphs in link prediction by introducing a two-stage data augmentation framework grounded in the Information Bottleneck. The Complete stage inflates the graph by adding high-probability edges, while the Reduce stage prunes edges through a Graph Information Bottleneck objective, applied on per-target-link subgraphs to avoid cross-link interference. The method uses variational bounds to optimize a loss that balances predictive power and compression, supported by theoretical guarantees under local-dependency assumptions. Empirically, CORE consistently improves Hits@50, boosts the usefulness of heuristic predictors, and enhances robustness to adversarial perturbations across diverse datasets and backbones, demonstrating practical value for robust LP in graph learning.
Abstract
Link prediction (LP) is a fundamental task in graph representation learning, with numerous applications in diverse domains. However, the generalizability of LP models is often compromised due to the presence of noisy or spurious information in graphs and the inherent incompleteness of graph data. To address these challenges, we draw inspiration from the Information Bottleneck principle and propose a novel data augmentation method, COmplete and REduce (CORE) to learn compact and predictive augmentations for LP models. In particular, CORE aims to recover missing edges in graphs while simultaneously removing noise from the graph structures, thereby enhancing the model's robustness and performance. Extensive experiments on multiple benchmark datasets demonstrate the applicability and superiority of CORE over state-of-the-art methods, showcasing its potential as a leading approach for robust LP in graph representation learning.
