DSL: Understanding and Improving Softmax Recommender Systems with Competition-Aware Scaling

Bucher Sahyouni; Matthew Vowels; Liqun Chen; Simon Hadfield

DSL: Understanding and Improving Softmax Recommender Systems with Competition-Aware Scaling

Bucher Sahyouni, Matthew Vowels, Liqun Chen, Simon Hadfield

TL;DR

DSL targets instability in softmax-based recommender training caused by a single global temperature and uniformly sampled negatives. It introduces two complementary branches: a within-example κ-branch that reweights negatives using hardness and item–item similarity, and a competition-aware (CA) branch that forms a top competitor slate and assigns per-example temperatures based on competition intensity, all while normalising to avoid logit drift. The approach yields distributionally robust improvements, supported by a KL-DRO interpretation and metric-aligned gradient estimates, with empirical gains across multiple datasets and backbones, especially under distribution shifts and for tail items. The work preserves the Softmax Loss backbone while reshaping the competition geometry to focus learning on the most informative substitutes, improving both accuracy and robustness in implicit-feedback recommender systems.

Abstract

Softmax Loss (SL) is being increasingly adopted for recommender systems (RS) as it has demonstrated better performance, robustness and fairness. Yet in implicit-feedback, a single global temperature and equal treatment of uniformly sampled negatives can lead to brittle training, because sampled sets may contain varying degrees of relevant or informative competitors. The optimal loss sharpness for a user-item pair with a particular set of negatives, can be suboptimal or destabilising for another with different negatives. We introduce Dual-scale Softmax Loss (DSL), which infers effective sharpness from the sampled competition itself. DSL adds two complementary branches to the log-sum-exp backbone. Firstly it reweights negatives within each training instance using hardness and item--item similarity, secondly it adapts a per-example temperature from the competition intensity over a constructed competitor slate. Together, these components preserve the geometry of SL while reshaping the competition distribution across negatives and across examples. Over several representative benchmarks and backbones, DSL yields substantial gains over strong baselines, with improvements over SL exceeding $10%$ in several settings and averaging $6.22%$ across datasets, metrics, and backbones. Under out-of-distribution (OOD) popularity shift, the gains are larger, with an average of $9.31%$ improvement over SL. We further provide a theoretical, distributionally robust optimisation (DRO) analysis, which demonstrates how DSL reshapes the robust payoff and the KL deviation for ambiguous instances. This helps explain the empirically observed improvements in accuracy and robustness.

DSL: Understanding and Improving Softmax Recommender Systems with Competition-Aware Scaling

TL;DR

Abstract

in several settings and averaging

across datasets, metrics, and backbones. Under out-of-distribution (OOD) popularity shift, the gains are larger, with an average of

improvement over SL. We further provide a theoretical, distributionally robust optimisation (DRO) analysis, which demonstrates how DSL reshapes the robust payoff and the KL deviation for ambiguous instances. This helps explain the empirically observed improvements in accuracy and robustness.

Paper Structure (39 sections, 39 equations, 5 figures, 4 tables)

This paper contains 39 sections, 39 equations, 5 figures, 4 tables.

Introduction
Related Work
Pointwise and Pairwise Objectives
Softmax-style Objectives
Preliminaries
Task Formulation
Softmax Loss
Methodology
Within Example Weighted Competition ($\kappa$ Branch)
Constructing $\kappa_{uij}$ from hardness and item--item similarity:
Per-negative logit scaling:
Competition-Aware Temperature (CA Branch)
Competitor slate:
Hardness-induced competitor distribution:
Competition intensity from (hardness $\times$ similarity):
...and 24 more sections

Figures (5)

Figure 1: Simplified diagram illustrating how DSL reshapes SL competition over two slates. $N_1$ and $N_2$ are two different sets of 15 randomly sampled negatives. $N_1$ has few strong sampled competitors so the per-example $\tau$ (shown with arrows) smoothens the loss. $N_2$ has high competition so the per-example $\tau$ sharpens DSL. The graph shows that within each set, competitive items receive larger relative weightings.
Figure 2: Health and Electronic Ablation
Figure 3: Percentage improvements for DSL over SL on NDCG@20 for Head and Tail item buckets, and the Tail--Head gap
Figure 4: Movie and Gowalla Ablation
Figure 5: Percentage improvements for DSL over SL on NDCG@20 for Head and Tail item buckets on LightGCN

DSL: Understanding and Improving Softmax Recommender Systems with Competition-Aware Scaling

TL;DR

Abstract

DSL: Understanding and Improving Softmax Recommender Systems with Competition-Aware Scaling

Authors

TL;DR

Abstract

Table of Contents

Figures (5)