Table of Contents
Fetching ...

Bridging the Gap: Self-Optimized Fine-Tuning for LLM-based Recommender Systems

Heng Tang, Feng Liu, Xinbo Chen, Jiawei Chen, Bohao Wang, Changwang Zhang, Jun Wang, Yuegang Sun, Bingde Hu, Can Wang

TL;DR

SOFT addresses the gap between pretrained LLM knowledge and recommendation tasks by uniting Guidance-Only and Tuning-Only strategies through a curriculum-inspired approach. It first generates an auxiliary easy-to-learn dataset via self-distillation from a fine-tuned LLM, then uses a self-adaptive curriculum to progressively train on easier data before real RS data, guided by a distance-based scheduler. The method achieves substantial improvements over multiple baselines across three Amazon datasets, with an average gain of $37.59\%$, while revealing the importance of the SA module and hyperparameter tuning. The approach introduces a practical training paradigm for LLM-based recommender systems, trading modestly higher training time for notably better accuracy, and lays groundwork for extending curriculum learning to broader RS tasks. Limitations include focus on sequence-based recommendations and LoRA-only fine-tuning, suggesting directions for future work.

Abstract

Recent years have witnessed extensive exploration of Large Language Models (LLMs) on the field of Recommender Systems (RS). There are currently two commonly used strategies to enable LLMs to have recommendation capabilities: 1) The "Guidance-Only" strategy uses in-context learning to exploit and amplify the inherent semantic understanding and item recommendation capabilities of LLMs; 2) The "Tuning-Only" strategy uses supervised fine-tuning (SFT) to fine-tune LLMs with the aim of fitting them to real recommendation data. However, neither of these strategies can effectively bridge the gap between the knowledge space of LLMs and recommendation, and their performance do not meet our expectations. To better enable LLMs to learn recommendation knowledge, we combine the advantages of the above two strategies and proposed a novel "Guidance+Tuning" method called Self-Optimized Fine-Tuning (SOFT), which adopts the idea of curriculum learning. It first employs self-distillation to construct an auxiliary easy-to-learn but meaningful dataset from a fine-tuned LLM. Then it further utilizes a self-adaptive curriculum scheduler to enable LLMs to gradually learn from simpler data (self-distilled data) to more challenging data (real RS data). Extensive experiments demonstrate that SOFT significantly enhances the recommendation accuracy (37.59\% on average) of LLM-based methods. The code is available via https://anonymous.4open.science/r/Self-Optimized-Fine-Tuning-264E

Bridging the Gap: Self-Optimized Fine-Tuning for LLM-based Recommender Systems

TL;DR

SOFT addresses the gap between pretrained LLM knowledge and recommendation tasks by uniting Guidance-Only and Tuning-Only strategies through a curriculum-inspired approach. It first generates an auxiliary easy-to-learn dataset via self-distillation from a fine-tuned LLM, then uses a self-adaptive curriculum to progressively train on easier data before real RS data, guided by a distance-based scheduler. The method achieves substantial improvements over multiple baselines across three Amazon datasets, with an average gain of , while revealing the importance of the SA module and hyperparameter tuning. The approach introduces a practical training paradigm for LLM-based recommender systems, trading modestly higher training time for notably better accuracy, and lays groundwork for extending curriculum learning to broader RS tasks. Limitations include focus on sequence-based recommendations and LoRA-only fine-tuning, suggesting directions for future work.

Abstract

Recent years have witnessed extensive exploration of Large Language Models (LLMs) on the field of Recommender Systems (RS). There are currently two commonly used strategies to enable LLMs to have recommendation capabilities: 1) The "Guidance-Only" strategy uses in-context learning to exploit and amplify the inherent semantic understanding and item recommendation capabilities of LLMs; 2) The "Tuning-Only" strategy uses supervised fine-tuning (SFT) to fine-tune LLMs with the aim of fitting them to real recommendation data. However, neither of these strategies can effectively bridge the gap between the knowledge space of LLMs and recommendation, and their performance do not meet our expectations. To better enable LLMs to learn recommendation knowledge, we combine the advantages of the above two strategies and proposed a novel "Guidance+Tuning" method called Self-Optimized Fine-Tuning (SOFT), which adopts the idea of curriculum learning. It first employs self-distillation to construct an auxiliary easy-to-learn but meaningful dataset from a fine-tuned LLM. Then it further utilizes a self-adaptive curriculum scheduler to enable LLMs to gradually learn from simpler data (self-distilled data) to more challenging data (real RS data). Extensive experiments demonstrate that SOFT significantly enhances the recommendation accuracy (37.59\% on average) of LLM-based methods. The code is available via https://anonymous.4open.science/r/Self-Optimized-Fine-Tuning-264E

Paper Structure

This paper contains 19 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Three different training strategies used in LLM-based recommender. The RS Dataset is the real recommendation dataset and the SD Dataset is dataset generated via self-distillation.
  • Figure 2: (left) Comparison of the training loss on the real dataset and the SD dataset in the first epoch; (right) The accuracy of LLMs on the training dataset after SFT, where "H" refers to "Hit Ratio" and "HC" refers to "Hit Ratio of Category".
  • Figure 3: An example of LLM's input and output. While the probability of the target item and the predict item being exactly the same is quite low, the probability of them being in the same category is relatively high.
  • Figure 4: The comparison of different training procedures among traditional models, SFT and SOFT.
  • Figure 5: The performance of SOFT with difference hyperparameter $\alpha$.
  • ...and 1 more figures