Table of Contents
Fetching ...

Personalized Denoising Implicit Feedback for Robust Recommender System

Kaike Zhang, Qi Cao, Yunfan Wu, Fei Sun, Huawei Shen, Xueqi Cheng

TL;DR

This paper tackles noise in implicit feedback for recommender systems by showing that while noisy and normal interactions overlap in the overall loss distribution, each user exhibits a clear separation in their personal loss distribution. It introduces PLD, a denoising framework that builds a per-user candidate pool and resamples training targets from a softmax over personalized losses with a temperature parameter, biasing learning toward normal interactions. The authors provide theoretical guarantees demonstrating when PLD outperforms standard training and validate it with extensive experiments on Gowalla, Yelp2018, MIND, and MIND-Large, using MF and LightGCN backbones; PLD achieves state-of-the-art results across BCE and BPR losses and remains robust to varying noise ratios. The work offers practical guidance on hyperparameters (e.g., candidate pool size $k$ and temperature $\\tau$) and shows that PLD can be integrated with existing backbones with modest computational overhead, improving both accuracy and resilience to noise in implicit-feedback scenarios.

Abstract

While implicit feedback is foundational to modern recommender systems, factors such as human error, uncertainty, and ambiguity in user behavior inevitably introduce significant noise into this feedback, adversely affecting the accuracy and robustness of recommendations. To address this issue, existing methods typically aim to reduce the training weight of noisy feedback or discard it entirely, based on the observation that noisy interactions often exhibit higher losses in the overall loss distribution. However, we identify two key issues: (1) there is a significant overlap between normal and noisy interactions in the overall loss distribution, and (2) this overlap becomes even more pronounced when transitioning from pointwise loss functions (e.g., BCE loss) to pairwise loss functions (e.g., BPR loss). This overlap leads traditional methods to misclassify noisy interactions as normal, and vice versa. To tackle these challenges, we further investigate the loss overlap and find that for a given user, there is a clear distinction between normal and noisy interactions in the user's personal loss distribution. Based on this insight, we propose a resampling strategy to Denoise using the user's Personal Loss distribution, named PLD, which reduces the probability of noisy interactions being optimized. Specifically, during each optimization iteration, we create a candidate item pool for each user and resample the items from this pool based on the user's personal loss distribution, prioritizing normal interactions. Additionally, we conduct a theoretical analysis to validate PLD's effectiveness and suggest ways to further enhance its performance. Extensive experiments conducted on three datasets with varying noise ratios demonstrate PLD's efficacy and robustness.

Personalized Denoising Implicit Feedback for Robust Recommender System

TL;DR

This paper tackles noise in implicit feedback for recommender systems by showing that while noisy and normal interactions overlap in the overall loss distribution, each user exhibits a clear separation in their personal loss distribution. It introduces PLD, a denoising framework that builds a per-user candidate pool and resamples training targets from a softmax over personalized losses with a temperature parameter, biasing learning toward normal interactions. The authors provide theoretical guarantees demonstrating when PLD outperforms standard training and validate it with extensive experiments on Gowalla, Yelp2018, MIND, and MIND-Large, using MF and LightGCN backbones; PLD achieves state-of-the-art results across BCE and BPR losses and remains robust to varying noise ratios. The work offers practical guidance on hyperparameters (e.g., candidate pool size and temperature ) and shows that PLD can be integrated with existing backbones with modest computational overhead, improving both accuracy and resilience to noise in implicit-feedback scenarios.

Abstract

While implicit feedback is foundational to modern recommender systems, factors such as human error, uncertainty, and ambiguity in user behavior inevitably introduce significant noise into this feedback, adversely affecting the accuracy and robustness of recommendations. To address this issue, existing methods typically aim to reduce the training weight of noisy feedback or discard it entirely, based on the observation that noisy interactions often exhibit higher losses in the overall loss distribution. However, we identify two key issues: (1) there is a significant overlap between normal and noisy interactions in the overall loss distribution, and (2) this overlap becomes even more pronounced when transitioning from pointwise loss functions (e.g., BCE loss) to pairwise loss functions (e.g., BPR loss). This overlap leads traditional methods to misclassify noisy interactions as normal, and vice versa. To tackle these challenges, we further investigate the loss overlap and find that for a given user, there is a clear distinction between normal and noisy interactions in the user's personal loss distribution. Based on this insight, we propose a resampling strategy to Denoise using the user's Personal Loss distribution, named PLD, which reduces the probability of noisy interactions being optimized. Specifically, during each optimization iteration, we create a candidate item pool for each user and resample the items from this pool based on the user's personal loss distribution, prioritizing normal interactions. Additionally, we conduct a theoretical analysis to validate PLD's effectiveness and suggest ways to further enhance its performance. Extensive experiments conducted on three datasets with varying noise ratios demonstrate PLD's efficacy and robustness.

Paper Structure

This paper contains 24 sections, 3 theorems, 9 equations, 9 figures, 9 tables, 1 algorithm.

Key Result

Theorem 1

For a user $u$, there are $n$ items with normal interactions and $m$ items with noisy interactions. Suppose the loss of each normal interaction follows a distribution with mean $\mu_1$ and variance $\sigma^2$, and the loss of each noisy interaction follows a distribution with mean $\mu_2$ and varian we have: where $C \in [\beta, \alpha]$ is a constant term.

Figures (9)

  • Figure 1: Probability distribution of losses. The overlap region includes interactions that deviate from the common assumption in existing methods, i.e., where noisy interactions exhibit lower losses or normal interactions exhibit higher losses. Quartiles are used instead of max-min values to mitigate the influence of extreme values when determining the overlap region.
  • Figure 2: Probability Distribution of losses.
  • Figure 3: Personal loss distribution for five users.
  • Figure 4: Difference between normal and noisy interactions in personal loss distributions across all users.
  • Figure 5: Recommendation performance of different denoising methods across various noise ratios.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Proposition 1
  • Proposition 2