Table of Contents
Fetching ...

Hard Negative Sampling via Large Language Models for Recommendation

Chu Zhao, Enneng Yang, Yuting Liu, Jianzhe Zhao, Guibing Guo

TL;DR

This work tackles false hard negatives in recommendation by introducing HNLMRec, a framework that uses Large Language Models to synthesize semantic hard negatives. It couples semantic priors from prompts with collaborative signals through a contrastive supervised fine-tuning objective, enabling the LLM to generate negatives that are semantically close to user preferences but behaviorally distinct. Theoretical analysis formalizes the semantic–behavioral gap and demonstrates how conditional probability shift and mutual information maximization mitigate FHNS, yielding unbiased gradients. Experiments across multiple real-world datasets show that HNLMRec outperforms ID-based and other LLM-enhanced baselines, with particular strength in data-sparse and long-tail regimes, and robust generalization to new domains. The approach highlights the potential of embedding-space synthesis and semantic–collaborative alignment to improve negative sampling and overall recommender performance.

Abstract

Hard negative sampling improves recommendation performance by accelerating convergence and sharpening the decision boundary. However, most existing methods rely on heuristic strategies, selecting negatives from a fixed candidate pool. Lacking semantic awareness, these methods often misclassify items that align with users' semantic interests as negatives, resulting in False Hard Negative Samples (FHNS). Such FHNS inject noisy supervision and hinder the model's optimal performance. To address this challenge, we propose HNLMRec, a generative semantic negative sampling framework. Leveraging the semantic reasoning capabilities of Large Language Models (LLMs), HNLMRec directly generates negative samples that are behaviorally distinct yet semantically relevant with respect to user preferences. Furthermore, we integrate collaborative filtering signals into the LLM via supervised fine-tuning, guiding the model to synthesize more reliable and informative hard negatives. Extensive experiments on multiple real-world datasets demonstrate that HNLMRec significantly outperforms traditional methods and LLM-enhanced baselines, while effectively mitigating popularity bias and data sparsity, thereby improving generalization.

Hard Negative Sampling via Large Language Models for Recommendation

TL;DR

This work tackles false hard negatives in recommendation by introducing HNLMRec, a framework that uses Large Language Models to synthesize semantic hard negatives. It couples semantic priors from prompts with collaborative signals through a contrastive supervised fine-tuning objective, enabling the LLM to generate negatives that are semantically close to user preferences but behaviorally distinct. Theoretical analysis formalizes the semantic–behavioral gap and demonstrates how conditional probability shift and mutual information maximization mitigate FHNS, yielding unbiased gradients. Experiments across multiple real-world datasets show that HNLMRec outperforms ID-based and other LLM-enhanced baselines, with particular strength in data-sparse and long-tail regimes, and robust generalization to new domains. The approach highlights the potential of embedding-space synthesis and semantic–collaborative alignment to improve negative sampling and overall recommender performance.

Abstract

Hard negative sampling improves recommendation performance by accelerating convergence and sharpening the decision boundary. However, most existing methods rely on heuristic strategies, selecting negatives from a fixed candidate pool. Lacking semantic awareness, these methods often misclassify items that align with users' semantic interests as negatives, resulting in False Hard Negative Samples (FHNS). Such FHNS inject noisy supervision and hinder the model's optimal performance. To address this challenge, we propose HNLMRec, a generative semantic negative sampling framework. Leveraging the semantic reasoning capabilities of Large Language Models (LLMs), HNLMRec directly generates negative samples that are behaviorally distinct yet semantically relevant with respect to user preferences. Furthermore, we integrate collaborative filtering signals into the LLM via supervised fine-tuning, guiding the model to synthesize more reliable and informative hard negatives. Extensive experiments on multiple real-world datasets demonstrate that HNLMRec significantly outperforms traditional methods and LLM-enhanced baselines, while effectively mitigating popularity bias and data sparsity, thereby improving generalization.

Paper Structure

This paper contains 34 sections, 28 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: False negative reduction (a) and cross-dataset performance comparison (b).
  • Figure 2: The overall pipeline of LLM-driven semantic negative sampling enhancing graph CF consists of three main components: user-item profile generation, semantic negative sampling, and semantic alignment & training
  • Figure 3: Figures (a) analyze the effects of the negative sample size. Figures (b) and (c) compare the convergence speed of HNLMRec against ID-based negative sampling baselines on Toys and Yelp.
  • Figure 4: Model potential: (a) sparsity study on Toys with varying training data ratios; (b) performance across item popularity groups.