Table of Contents
Fetching ...

ITDR: An Instruction Tuning Dataset for Enhancing Large Language Models in Recommendations

Zekun Liu, Xiaowen Huang, Jitao Sang

TL;DR

This work tackles the gap between natural language instruction tuning and recommendation tasks by introducing ITDR, a large-scale instruction tuning dataset for recommender systems defined over two root tasks ($UII$ and $UIU$) and seven subtasks, drawn from 13 public datasets and totaling about 195,065 instructions. ITDR uses manually crafted templates to convert traditional user-item data into instruction-style inputs, enabling effective fine-tuning of open-source LLMs with LoRA to improve both interaction-based and understanding-based recommendation tasks. The authors conduct extensive experiments across multiple models (e.g., LLaMA-3.2, Qwen2.5, GLM-4) and demonstrate consistent gains, analyze task relations and the impact of task descriptions and data volume, and compare against closed-source LLMs to show competitive performance. The work contributes a publicly available dataset and fine-tuned models, provides nuanced insights into cross-task transfer and instruction design, and outlines practical directions for expanding instruction-tuning in recommender systems.

Abstract

Large language models (LLMs) have demonstrated outstanding performance in natural language processing tasks. However, in the field of recommender systems, due to the inherent structural discrepancy between user behavior data and natural language, LLMs struggle to effectively model the associations between user preferences and items. Although prompt-based methods can generate recommendation results, their inadequate understanding of recommendation tasks leads to constrained performance. To address this gap, we construct a comprehensive instruction tuning dataset, ITDR, which encompasses seven subtasks across two root tasks: user-item interaction and user-item understanding. The dataset integrates data from 13 public recommendation datasets and is built using manually crafted standardized templates, comprising approximately 200,000 instances. Experimental results demonstrate that ITDR significantly enhances the performance of mainstream open-source LLMs such as GLM-4, Qwen2.5, Qwen2.5-Instruct and LLaMA-3.2 on recommendation tasks. Furthermore, we analyze the correlations between tasks and explore the impact of task descriptions and data scale on instruction tuning effectiveness. Finally, we perform comparative experiments against closed-source LLMs with massive parameters. Our tuning dataset ITDR, the fine-tuned large recommendation models, all LoRA modules, and the complete experimental results are available at https://github.com/hellolzk/ITDR.

ITDR: An Instruction Tuning Dataset for Enhancing Large Language Models in Recommendations

TL;DR

This work tackles the gap between natural language instruction tuning and recommendation tasks by introducing ITDR, a large-scale instruction tuning dataset for recommender systems defined over two root tasks ( and ) and seven subtasks, drawn from 13 public datasets and totaling about 195,065 instructions. ITDR uses manually crafted templates to convert traditional user-item data into instruction-style inputs, enabling effective fine-tuning of open-source LLMs with LoRA to improve both interaction-based and understanding-based recommendation tasks. The authors conduct extensive experiments across multiple models (e.g., LLaMA-3.2, Qwen2.5, GLM-4) and demonstrate consistent gains, analyze task relations and the impact of task descriptions and data volume, and compare against closed-source LLMs to show competitive performance. The work contributes a publicly available dataset and fine-tuned models, provides nuanced insights into cross-task transfer and instruction design, and outlines practical directions for expanding instruction-tuning in recommender systems.

Abstract

Large language models (LLMs) have demonstrated outstanding performance in natural language processing tasks. However, in the field of recommender systems, due to the inherent structural discrepancy between user behavior data and natural language, LLMs struggle to effectively model the associations between user preferences and items. Although prompt-based methods can generate recommendation results, their inadequate understanding of recommendation tasks leads to constrained performance. To address this gap, we construct a comprehensive instruction tuning dataset, ITDR, which encompasses seven subtasks across two root tasks: user-item interaction and user-item understanding. The dataset integrates data from 13 public recommendation datasets and is built using manually crafted standardized templates, comprising approximately 200,000 instances. Experimental results demonstrate that ITDR significantly enhances the performance of mainstream open-source LLMs such as GLM-4, Qwen2.5, Qwen2.5-Instruct and LLaMA-3.2 on recommendation tasks. Furthermore, we analyze the correlations between tasks and explore the impact of task descriptions and data scale on instruction tuning effectiveness. Finally, we perform comparative experiments against closed-source LLMs with massive parameters. Our tuning dataset ITDR, the fine-tuned large recommendation models, all LoRA modules, and the complete experimental results are available at https://github.com/hellolzk/ITDR.

Paper Structure

This paper contains 29 sections, 10 equations, 6 figures, 24 tables.

Figures (6)

  • Figure 1: ITDR architecture: tasks, subtasks, and datasets.
  • Figure 2: pUII and pUIU of all backbone models on two tasks before and after fine-tuning. "ITDR-*" refers to the backbone model "*" is fine-tuned by ITDR.
  • Figure 3: Average performance of removing different root tasks.
  • Figure 4: Ablation studies of using no task descriptions during fine-tuning.
  • Figure 5: Average performance of using different data volumes for fine-tuning.
  • ...and 1 more figures