Evaluating and Addressing Fairness Across User Groups in Negative Sampling for Recommender Systems

Yueqing Xuan; Kacper Sokol; Mark Sanderson; Jeffrey Chan

Evaluating and Addressing Fairness Across User Groups in Negative Sampling for Recommender Systems

Yueqing Xuan, Kacper Sokol, Mark Sanderson, Jeffrey Chan

TL;DR

The paper tackles fairness gaps in negative sampling for implicit-feedback recommenders, revealing that active users consistently receive more informative negatives and better recommendations. It systematically evaluates eight samplers across four datasets, showing that global increases in negative sampling ratio improve average performance but disproportionately benefit active users. To address this, it introduces a group-specific ratio framework optimized via Hyperband with informed priors, which yields notable gains for inactive users and overall accuracy. The work provides practical guidance for fairness-aware negative sampling and offers insights into cold-start implications and efficient hyperparameter search. These findings advance fair, data-balanced recommender design in real-world, imbalanced user bases.

Abstract

Recommender systems trained on implicit feedback data rely on negative sampling to distinguish positive items from negative items for each user. Since the majority of positive interactions come from a small group of active users, negative samplers are often impacted by data imbalance, leading them to choose more informative negatives for prominent users while providing less useful ones for users who are not so active. This leads to inactive users being further marginalised in the training process, thus receiving inferior recommendations. In this paper, we conduct a comprehensive empirical study demonstrating that state-of-the-art negative sampling strategies provide more accurate recommendations for active users than for inactive users. We also find that increasing the number of negative samples for each positive item improves the average performance, but the benefit is distributed unequally across user groups, with active users experiencing performance gain while inactive users suffering performance degradation. To address this, we propose a group-specific negative sampling strategy that assigns smaller negative ratios to inactive user groups and larger ratios to active groups. Experiments on eight negative samplers show that our approach improves user-side fairness and performance when compared to a uniform global ratio.

Evaluating and Addressing Fairness Across User Groups in Negative Sampling for Recommender Systems

TL;DR

Abstract

Paper Structure (30 sections, 3 equations, 4 figures, 4 tables)

This paper contains 30 sections, 3 equations, 4 figures, 4 tables.

Introduction
Related Work and Preliminaries
Negative Sampling for Recommendation
Fairness in Recommender Systems
Preliminaries
Evaluating Negative Samplers
User Partitioning
RQ1: Performance Disparity
RQ2: Performance Change for Larger Ratios
RQ3: Performance Change Distribution
Experimental Setup
Negative Samplers
Datasets
Evaluation Metrics
Hyperparameter Setting
...and 15 more sections

Figures (4)

Figure 1: Percentage variation between the NDCG of four user groups compared to the average NDCG value across all users (dashed line) for each dataset. All negative samplers are run with the LightGCN model.
Figure 2: Performance of the recommenders with different negative sampling strategies under different negative sampling ratios across datasets. Samplers in Panels (\ref{['fig:ml_mf']}--\ref{['fig:pinterest_mf']}) are integrated with MF, and in Panels (\ref{['fig:ml_light']}--\ref{['fig:pinterest_light']}) with LightGCN.
Figure 3: Relative NDCG change for each user group when $\mathcal{K}>1$ compared to performance at $\mathcal{K}=1$. For brevity, we report results for four representative samplers integrated with MF on two datasets. Full results -- which follow the same patterns -- are available in the dedicated code repository. Group 1 captures the least and Group 4 the most active users.
Figure 4: Proportion of (\ref{['fig:beauty_good']}) good and (\ref{['fig:beauty_bad']}) bad samples as well as (\ref{['fig:beauty_effectiveness']}) training effectiveness for each user group in the Amazon Beauty dataset with DNS and MF. Calculation details can be found in the original paper wu2021rethinking.

Theorems & Definitions (2)

Definition 1: Performance Change Distribution Disparity
Definition 2: Unequal Distribution Persistence

Evaluating and Addressing Fairness Across User Groups in Negative Sampling for Recommender Systems

TL;DR

Abstract

Evaluating and Addressing Fairness Across User Groups in Negative Sampling for Recommender Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (2)