Enhancing Recommender Systems: A Strategy to Mitigate False Negative Impact
Kexin Shi, Yun Zhang, Bingyi Jing, Wenjia Wang
TL;DR
The paper tackles over-fitting in implicit collaborative filtering caused by hard negative sampling, attributing it to the incorrect selection of false negatives. It analyzes positive mixing and introduces Positive-Dominated Negative Synthesizing (PDNS), an approach that generates hard negatives dominated by positive information and can be implemented as a simple soft-BPR loss: $L_{BPR}^{soft} = - \sum \ln \sigma\left(\beta\left( y_{ui}-y_{uj} \right)\right)$ with $\tilde{e}_{j'} = \alpha e_i + (1-\alpha) e_j$ and α > 0.7; this flattens the loss landscape and reduces gradient magnitudes on the hardest negatives, improving robustness to false negatives. Empirically, PDNS yields consistent gains across three real-world datasets with LightGCN and MF (average improvements ~3.98% for LightGCN and 4.72% for MF on Recall@50), while maintaining efficiency comparable to existing hard-negative methods. The approach is model-agnostic and easy to integrate, offering a practical, scalable way to mitigate false-negative effects in negative sampling for recommender systems.
Abstract
In implicit collaborative filtering (CF) task of recommender systems, recent works mainly focus on model structure design with promising techniques like graph neural networks (GNNs). Effective and efficient negative sampling methods that suit these models, however, remain underdeveloped. One challenge is that existing hard negative samplers tend to suffer from severer over-fitting in model training. In this work, we first study the reason behind the over-fitting, and illustrate it with the incorrect selection of false negative instances with the support of experiments. In addition, we empirically observe a counter-intuitive phenomenon, that is, polluting hard negative samples' embeddings with a quite large proportional of positive samples' embeddings will lead to remarkable performance gains for prediction accuracy. On top of this finding, we present a novel negative sampling strategy, i.e., positive-dominated negative synthesizing (PDNS). Moreover, we provide theoretical analysis and derive a simple equivalent algorithm of PDNS, where only a soft factor is added in the loss function. Comprehensive experiments on three real-world datasets demonstrate the superiority of our proposed method in terms of both effectiveness and robustness.
