RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

Danil Gusak; Gleb Mezentsev; Ivan Oseledets; Evgeny Frolov

RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

Danil Gusak, Gleb Mezentsev, Ivan Oseledets, Evgeny Frolov

TL;DR

Full Cross-Entropy loss delivers state-of-the-art accuracy in sequential recommender models but is memory-prohibitive for large catalogs. The paper introduces RECE, a GPU-friendly reduced Cross-Entropy loss that uses hashed bucketing and hard-negative mining to approximate CE without constructing the full logit tensor. It demonstrates up to 12x peak memory reductions while preserving or improving ranking metrics across multiple datasets, with strong performance on large catalogs and competitive results on smaller ones. The approach broadens the practicality of CE-like objectives and has potential applications beyond recommender systems, including NLP and search.

Abstract

Scalability is a major challenge in modern recommender systems. In sequential recommendations, full Cross-Entropy (CE) loss achieves state-of-the-art recommendation quality but consumes excessive GPU memory with large item catalogs, limiting its practicality. Using a GPU-efficient locality-sensitive hashing-like algorithm for approximating large tensor of logits, this paper introduces a novel RECE (REduced Cross-Entropy) loss. RECE significantly reduces memory consumption while allowing one to enjoy the state-of-the-art performance of full CE loss. Experimental results on various datasets show that RECE cuts training peak memory usage by up to 12 times compared to existing methods while retaining or exceeding performance metrics of CE loss. The approach also opens up new possibilities for large-scale applications in other domains.

RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

TL;DR

Abstract

Paper Structure (9 sections, 4 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 9 sections, 4 equations, 3 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Reduced Cross-Entropy
Experimental Settings
Datasets
Evaluation
Model and Baselines
Results
Conclusion

Figures (3)

Figure 1: Impact of different components on peak GPU memory usage during SASRec training with Cross-Entropy loss. Measurements were conducted using PyTorch profiling tools.
Figure 2: Temporal data splitting strategy.
Figure 3: (a)-(d) Pareto front curves illustrating NDCG@10 for different memory budgets. (e)-(h) Same points plotted on NDCG@10 vs. Training time axes, with point sizes representing memory values from corresponding (a)-(d) plots.

RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

TL;DR

Abstract

RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

Authors

TL;DR

Abstract

Table of Contents

Figures (3)