Table of Contents
Fetching ...

Improving LLM-based Recommendation with Self-Hard Negatives from Intermediate Layers

Bingqian Li, Bowen Zheng, Xiaolei Wang, Long Zhang, Jinpeng Wang, Sheng Chen, Wayne Xin Zhao, Ji-rong Wen

TL;DR

ILRec tackles the challenge of large negative spaces in LLM-based recommendation by mining token-level self-hard negatives from intermediate layers, enabling fine-grained preference learning during supervised fine-tuning. It introduces a two-stage framework: (i) cross-layer preference optimization that penalizes high-probability negatives from intermediate layers, and (ii) cross-layer preference distillation that guides intermediate layers to mirror the final output. A lightweight collaborative filtering component assigns rewards to penalized tokens to mitigate false negatives, yielding a final loss that combines $\mathcal{L}_{CPT}$ and $\mathcal{L}_{CRR}$. Across three Amazon datasets, ILRec consistently outperforms baselines, with ablations confirming the contribution of each component and analyses showing robust performance across layer choices, backbones, and tasks, as well as favorable efficiency compared to RLVR methods.

Abstract

Large language models (LLMs) have shown great promise in recommender systems, where supervised fine-tuning (SFT) is commonly used for adaptation. Subsequent studies further introduce preference learning to incorporate negative samples into the training process. However, existing methods rely on sequence-level, offline-generated negatives, making them less discriminative and informative when adapting LLMs to recommendation tasks with large negative item spaces. To address these challenges, we propose ILRec, a novel preference fine-tuning framework for LLM-based recommendation, leveraging self-hard negative signals extracted from intermediate layers to improve preference learning. Specifically, we identify self-hard negative tokens from intermediate layers as fine-grained negative supervision that dynamically reflects the model's preference learning process. To effectively integrate these signals into training, we design a two-stage framework comprising cross-layer preference optimization and cross-layer preference distillation, enabling the model to jointly discriminate informative negatives and enhance the quality of negative signals from intermediate layers. In addition, we introduce a lightweight collaborative filtering model to assign token-level rewards for negative signals, mitigating the risk of over-penalizing false negatives. Extensive experiments on three datasets demonstrate ILRec's effectiveness in enhancing the performance of LLM-based recommender systems.

Improving LLM-based Recommendation with Self-Hard Negatives from Intermediate Layers

TL;DR

ILRec tackles the challenge of large negative spaces in LLM-based recommendation by mining token-level self-hard negatives from intermediate layers, enabling fine-grained preference learning during supervised fine-tuning. It introduces a two-stage framework: (i) cross-layer preference optimization that penalizes high-probability negatives from intermediate layers, and (ii) cross-layer preference distillation that guides intermediate layers to mirror the final output. A lightweight collaborative filtering component assigns rewards to penalized tokens to mitigate false negatives, yielding a final loss that combines and . Across three Amazon datasets, ILRec consistently outperforms baselines, with ablations confirming the contribution of each component and analyses showing robust performance across layer choices, backbones, and tasks, as well as favorable efficiency compared to RLVR methods.

Abstract

Large language models (LLMs) have shown great promise in recommender systems, where supervised fine-tuning (SFT) is commonly used for adaptation. Subsequent studies further introduce preference learning to incorporate negative samples into the training process. However, existing methods rely on sequence-level, offline-generated negatives, making them less discriminative and informative when adapting LLMs to recommendation tasks with large negative item spaces. To address these challenges, we propose ILRec, a novel preference fine-tuning framework for LLM-based recommendation, leveraging self-hard negative signals extracted from intermediate layers to improve preference learning. Specifically, we identify self-hard negative tokens from intermediate layers as fine-grained negative supervision that dynamically reflects the model's preference learning process. To effectively integrate these signals into training, we design a two-stage framework comprising cross-layer preference optimization and cross-layer preference distillation, enabling the model to jointly discriminate informative negatives and enhance the quality of negative signals from intermediate layers. In addition, we introduce a lightweight collaborative filtering model to assign token-level rewards for negative signals, mitigating the risk of over-penalizing false negatives. Extensive experiments on three datasets demonstrate ILRec's effectiveness in enhancing the performance of LLM-based recommender systems.
Paper Structure (22 sections, 13 equations, 6 figures, 10 tables)

This paper contains 22 sections, 13 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Limitations of traditional negative sampling methods and solutions provided by ILRec, aiming at extracting more fine-grained and informative negative signals.
  • Figure 2: Training loss curves of different layers in LLaMA3.1-8B on Instrument dataset using LC-Rec. $L$ denotes the final layer, while $L-k$ denotes the $k$-th layer before the final layer.
  • Figure 3: The overall framework of ILRec.
  • Figure 4: Performance Comparison w.r.t. Different Model Backbones on the Instrument dataset with BIGRec and LC-Rec training paradigms.
  • Figure 5: Performance Comparison w.r.t. Numbers of Intermediate Layers on the Instrument dataset with both BIGRec and LC-Rec paradigms.
  • ...and 1 more figures