Adaptive Hardness Negative Sampling for Collaborative Filtering
Riwei Lai, Rui Chen, Qilong Han, Chi Zhang, Li Chen
TL;DR
This work addresses the limitation of fixed-hardness negative sampling in implicit collaborative filtering, which induces false positives and false negatives during training. It introduces Adaptive Hardness Negative Sampling (AHNS) with three criteria to adapt negative sample hardness per positive example and across training, and offers AHNS_{p<0} as a concrete instantiation with a two-pass candidate sampling and a rating function parameterized by $\alpha$, $\beta$, and $p<0$. The authors prove that AHNS_{p<0} satisfies the criteria and yields a larger lower bound on NDCG than fixed hardness methods, supported by extensive experiments showing consistent gains across four datasets. The results demonstrate the practical viability and efficiency of adaptive hardness sampling, offering a new direction for improving implicit CF performance with principled negative sampling design.
Abstract
Negative sampling is essential for implicit collaborative filtering to provide proper negative training signals so as to achieve desirable performance. We experimentally unveil a common limitation of all existing negative sampling methods that they can only select negative samples of a fixed hardness level, leading to the false positive problem (FPP) and false negative problem (FNP). We then propose a new paradigm called adaptive hardness negative sampling (AHNS) and discuss its three key criteria. By adaptively selecting negative samples with appropriate hardnesses during the training process, AHNS can well mitigate the impacts of FPP and FNP. Next, we present a concrete instantiation of AHNS called AHNS_{p<0}, and theoretically demonstrate that AHNS_{p<0} can fit the three criteria of AHNS well and achieve a larger lower bound of normalized discounted cumulative gain. Besides, we note that existing negative sampling methods can be regarded as more relaxed cases of AHNS. Finally, we conduct comprehensive experiments, and the results show that AHNS_{p<0} can consistently and substantially outperform several state-of-the-art competitors on multiple datasets.
