Table of Contents
Fetching ...

Towards Reliable Negative Sampling for Recommendation with Implicit Feedback via In-Community Popularity

Chen Chen, Haobo Lin, Yuanbo Xu

TL;DR

This work tackles the unreliable nature of negative sampling under implicit feedback by tying negative selection to exposure through latent user communities. The proposed ICPNS framework identifies communities from pretrained embeddings and uses in-community item popularity, smoothed by a parameter $\alpha$, to sample negatives, ensuring realness, hardness, and interpretability. A two-stage training pipeline—pretraining with random negatives followed by fine-tuning with ICPNS—yields consistent gains on graph-based recommender models and competitive results on MF-based models across four diverse datasets, with efficient $O(1)$ sampling via the Alias method. The approach offers a principled, scalable, and interpretable perspective on negative sampling by explicitly connecting it to exposure modeling, and it provides valuable insights into how community structure and smoothing influence recommendation performance.

Abstract

Learning from implicit feedback is a fundamental problem in modern recommender systems, where only positive interactions are observed and explicit negative signals are unavailable. In such settings, negative sampling plays a critical role in model training by constructing negative items that enable effective preference learning and ranking optimization. However, designing reliable negative sampling strategies remains challenging, as they must simultaneously ensure realness, hardness, and interpretability. To this end, we propose \textbf{ICPNS (In-Community Popularity Negative Sampling)}, a novel framework that leverages user community structure to identify reliable and informative negative samples. Our approach is grounded in the insight that item exposure is driven by latent user communities. By identifying these communities and utilizing in-community popularity, ICPNS effectively approximates the probability of item exposure. Consequently, items that are popular within a user's community but remain unclicked are identified as more reliable true negatives. Extensive experiments on four benchmark datasets demonstrate that ICPNS yields consistent improvements on graph-based recommenders and competitive performance on MF-based models, outperforming representative negative sampling strategies under a unified evaluation protocol.

Towards Reliable Negative Sampling for Recommendation with Implicit Feedback via In-Community Popularity

TL;DR

This work tackles the unreliable nature of negative sampling under implicit feedback by tying negative selection to exposure through latent user communities. The proposed ICPNS framework identifies communities from pretrained embeddings and uses in-community item popularity, smoothed by a parameter , to sample negatives, ensuring realness, hardness, and interpretability. A two-stage training pipeline—pretraining with random negatives followed by fine-tuning with ICPNS—yields consistent gains on graph-based recommender models and competitive results on MF-based models across four diverse datasets, with efficient sampling via the Alias method. The approach offers a principled, scalable, and interpretable perspective on negative sampling by explicitly connecting it to exposure modeling, and it provides valuable insights into how community structure and smoothing influence recommendation performance.

Abstract

Learning from implicit feedback is a fundamental problem in modern recommender systems, where only positive interactions are observed and explicit negative signals are unavailable. In such settings, negative sampling plays a critical role in model training by constructing negative items that enable effective preference learning and ranking optimization. However, designing reliable negative sampling strategies remains challenging, as they must simultaneously ensure realness, hardness, and interpretability. To this end, we propose \textbf{ICPNS (In-Community Popularity Negative Sampling)}, a novel framework that leverages user community structure to identify reliable and informative negative samples. Our approach is grounded in the insight that item exposure is driven by latent user communities. By identifying these communities and utilizing in-community popularity, ICPNS effectively approximates the probability of item exposure. Consequently, items that are popular within a user's community but remain unclicked are identified as more reliable true negatives. Extensive experiments on four benchmark datasets demonstrate that ICPNS yields consistent improvements on graph-based recommenders and competitive performance on MF-based models, outperforming representative negative sampling strategies under a unified evaluation protocol.
Paper Structure (47 sections, 19 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 47 sections, 19 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of negative sampling ambiguity and the motivation of ICPNS. (A) Two types of unobserved items under implicit feedback. Triangles with darker colors indicate higher sample hardness; as hardness increases, the likelihood of false negatives also increases. (B) The inherent ambiguity of negative samples: true negatives arising from genuine disinterest and false negatives caused by lack of exposure. ICPNS approximates item exposure using in-community popularity, treating unclicked but community-popular items as more reliable negatives.
  • Figure 2: The detailed framework of ICPNS. After encoder is well-pretrained by RNS, the model switches to our ICPNS strategy. Here, users are clustered into latent communities based on pre-trained embeddings, and in-community popularity is leveraged to sample reliable negative items.
  • Figure 3: Comparison of training time per epoch across datasets. The y-axis is plotted on a logarithmic scale. To improve readability, small and large datasets are reported separately. ICPNS consistently achieves lower computational overhead compared to HNS.
  • Figure 4: Comparison of hardness across different negative sampling strategies, averaged over all training epochs across four datasets.
  • Figure 5: Parameter sensitivity analysis of ICPNS on Beauty and ML-100K. The top row (a–b) illustrates the impact of $P$ , while the bottom row (c–d) displays the effect of $\alpha$.
  • ...and 4 more figures