Hard negative sampling in hyperedge prediction
Zhenyu Deng, Tao Zhou, Yilin Bi
TL;DR
This work tackles the challenge of negative sampling in hyperedge prediction by introducing hard negative sampling (HNS), which synthesizes difficult negatives in the hyperedge embedding space through positive-sample injection with a tunable strength $\alpha$. By leveraging hypergraph embeddings and an attention-based synthesis mechanism, HNS yields harder negatives that improve classifier discrimination and model robustness across multiple datasets and architectures. The approach is plug-and-play, scalable with mini-batch training and GPU acceleration, and extends beyond a single model to diverse HGNN-based pipelines. Limitations include dependence on embedding quality and the need to tune $\alpha$, with future work proposed on adaptive injection strategies and broader applications in graph learning tasks.
Abstract
Hypergraph, which allows each hyperedge to encompass an arbitrary number of nodes, is a powerful tool for modeling multi-entity interactions. Hyperedge prediction is a fundamental task that aims to predict future hyperedges or identify existent but unobserved hyperedges based on those observed. In link prediction for simple graphs, most observed links are treated as positive samples, while all unobserved links are considered as negative samples. However, this full-sampling strategy is impractical for hyperedge prediction, due to the number of unobserved hyperedges in a hypergraph significantly exceeds the number of observed ones. Therefore, one has to utilize some negative sampling methods to generate negative samples, ensuring their quantity is comparable to that of positive samples. In current hyperedge prediction, randomly selecting negative samples is a routine practice. But through experimental analysis, we discover a critical limitation of random selecting that the generated negative samples are too easily distinguishable from positive samples. This leads to premature convergence of the model and reduces the accuracy of prediction. To overcome this issue, we propose a novel method to generate negative samples, named as hard negative sampling (HNS). Unlike traditional methods that construct negative hyperedges by selecting node sets from the original hypergraph, HNS directly synthesizes negative samples in the hyperedge embedding space, thereby generating more challenging and informative negative samples. Our results demonstrate that HNS significantly enhances both accuracy and robustness of the prediction. Moreover, as a plug-and-play technique, HNS can be easily applied in the training of various hyperedge prediction models based on representation learning.
