Table of Contents
Fetching ...

SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering

Xiaodong Yang, Huiyuan Chen, Yuchen Yan, Yuxin Tang, Yuying Zhao, Eric Xu, Yiwei Cai, Hanghang Tong

TL;DR

This work addresses the inefficiencies and suboptimality of traditional pairwise losses in collaborative filtering by introducing SimCE, an upper-bound-based simplification of the Sampled Softmax Cross-Entropy loss. SimCE retains the benefits of using multiple negative samples while focusing updates on the hardest negative, yielding faster convergence and solid performance gains over BPR and SSM across 12 diverse datasets and two backbones. The results demonstrate that thoughtful loss-function design, particularly around negative sampling and margin, can substantially improve both accuracy (Recall/NDCG) and training efficiency in large-scale recommender systems. The approach is presented as easily integrable into existing frameworks, with practical guidance on negative-sample size and margin settings, and shows broad applicability to MF and graph-based backbones like LightGCN.

Abstract

The learning objective is integral to collaborative filtering systems, where the Bayesian Personalized Ranking (BPR) loss is widely used for learning informative backbones. However, BPR often experiences slow convergence and suboptimal local optima, partially because it only considers one negative item for each positive item, neglecting the potential impacts of other unobserved items. To address this issue, the recently proposed Sampled Softmax Cross-Entropy (SSM) compares one positive sample with multiple negative samples, leading to better performance. Our comprehensive experiments confirm that recommender systems consistently benefit from multiple negative samples during training. Furthermore, we introduce a \underline{Sim}plified Sampled Softmax \underline{C}ross-\underline{E}ntropy Loss (SimCE), which simplifies the SSM using its upper bound. Our validation on 12 benchmark datasets, using both MF and LightGCN backbones, shows that SimCE significantly outperforms both BPR and SSM.

SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering

TL;DR

This work addresses the inefficiencies and suboptimality of traditional pairwise losses in collaborative filtering by introducing SimCE, an upper-bound-based simplification of the Sampled Softmax Cross-Entropy loss. SimCE retains the benefits of using multiple negative samples while focusing updates on the hardest negative, yielding faster convergence and solid performance gains over BPR and SSM across 12 diverse datasets and two backbones. The results demonstrate that thoughtful loss-function design, particularly around negative sampling and margin, can substantially improve both accuracy (Recall/NDCG) and training efficiency in large-scale recommender systems. The approach is presented as easily integrable into existing frameworks, with practical guidance on negative-sample size and margin settings, and shows broad applicability to MF and graph-based backbones like LightGCN.

Abstract

The learning objective is integral to collaborative filtering systems, where the Bayesian Personalized Ranking (BPR) loss is widely used for learning informative backbones. However, BPR often experiences slow convergence and suboptimal local optima, partially because it only considers one negative item for each positive item, neglecting the potential impacts of other unobserved items. To address this issue, the recently proposed Sampled Softmax Cross-Entropy (SSM) compares one positive sample with multiple negative samples, leading to better performance. Our comprehensive experiments confirm that recommender systems consistently benefit from multiple negative samples during training. Furthermore, we introduce a \underline{Sim}plified Sampled Softmax \underline{C}ross-\underline{E}ntropy Loss (SimCE), which simplifies the SSM using its upper bound. Our validation on 12 benchmark datasets, using both MF and LightGCN backbones, shows that SimCE significantly outperforms both BPR and SSM.

Paper Structure

This paper contains 20 sections, 8 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: PyTorch style pseudo-code for three loss functions: BPR, SSM and SimCE.
  • Figure 2: Training curves of different loss functions in terms of Recall@20.
  • Figure 3: The impact of the number of negative samples $|\mathcal{N}|$ for both SSM and SimCE in terms of Recall@20.
  • Figure 4: The impact of different margin values $\gamma$ on Gowalla, iFashion, Yelp, Kindle, Book and Movies datasets.