Aligning Large Language Models with Recommendation Knowledge

Yuwei Cao; Nikhil Mehta; Xinyang Yi; Raghunandan Keshavan; Lukasz Heldt; Lichan Hong; Ed H. Chi; Maheswaran Sathiamoorthy

Aligning Large Language Models with Recommendation Knowledge

Yuwei Cao, Nikhil Mehta, Xinyang Yi, Raghunandan Keshavan, Lukasz Heldt, Lichan Hong, Ed H. Chi, Maheswaran Sathiamoorthy

TL;DR

The paper addresses the gap between large language models (LLMs) and the domain-specific knowledge required for effective recommendations. It introduces auxiliary-task data generation by converting classical recommender signals (Masked Item Modeling, Masked Language Modeling, and Bayesian Personalized Ranking) into natural-language prompts, and pairs them with more informative recommendation-task prompts that omit user IDs while including item titles. A simple multi-task fine-tuning framework is used to align LLM backbones (FLAN-T5-XL/Base) with three real-world domains (Amazon Toys & Games, Beauty, Sports & Outdoors), yielding substantial retrieval improvements and competitive ranking and rating-prediction performance, often surpassing state-of-the-art baselines. The work demonstrates the practical potential of injecting recommendation-specific knowledge into LLMs, while noting computational costs and the need for further refinement of data prompts and content representations.

Abstract

Large language models (LLMs) have recently been used as backbones for recommender systems. However, their performance often lags behind conventional methods in standard tasks like retrieval. We attribute this to a mismatch between LLMs' knowledge and the knowledge crucial for effective recommendations. While LLMs excel at natural language reasoning, they cannot model complex user-item interactions inherent in recommendation tasks. We propose bridging the knowledge gap and equipping LLMs with recommendation-specific knowledge to address this. Operations such as Masked Item Modeling (MIM) and Bayesian Personalized Ranking (BPR) have found success in conventional recommender systems. Inspired by this, we simulate these operations through natural language to generate auxiliary-task data samples that encode item correlations and user preferences. Fine-tuning LLMs on such auxiliary-task data samples and incorporating more informative recommendation-task data samples facilitates the injection of recommendation-specific knowledge into LLMs. Extensive experiments across retrieval, ranking, and rating prediction tasks on LLMs such as FLAN-T5-Base and FLAN-T5-XL show the effectiveness of our technique in domains such as Amazon Toys & Games, Beauty, and Sports & Outdoors. Notably, our method outperforms conventional and LLM-based baselines, including the current SOTA, by significant margins in retrieval, showcasing its potential for enhancing recommendation quality.

Aligning Large Language Models with Recommendation Knowledge

TL;DR

Abstract

Paper Structure (25 sections, 3 equations, 3 figures, 11 tables, 1 algorithm)

This paper contains 25 sections, 3 equations, 3 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Methodology
Auxiliary-task Data Generation
Masked Item Modeling (MIM)
Masked Language Modeling (MLM)
Bayesian Personalized Ranking (BPR)
Recommendation-task Data Generation
Fine-tuning and Evaluation Framework
Experiments
Experimental Setting
Overall Performance (RQ1 & RQ2)
Ablation Studies (RQ3 & RQ4)
Conclusion
Limitations
...and 10 more sections

Figures (3)

Figure 1: Data samples adopted by the existing studies and this work. (a) shows the recommendation-task data samples of the existing studies. Specifically, (a1)-(a3) demonstrate the retrieval, ranking, and rating prediction data samples of P5 geng2022recommendation; (a4) shows a ranking (type <P1, I0, T3>) data sample of InstructRec zhang2023recommendation; (a5) is a rating prediction data sample of TALLRec bao2023tallrec. (b) shows our recommendation-task (blue boxes) and auxiliary-task (purple boxes) data samples (we present more samples in Appendix \ref{['sec:appendix_samples']}).
Figure 2: Fine-tuning and evaluation framework.
Figure 3: Item embedding (IE) data samples.

Aligning Large Language Models with Recommendation Knowledge

TL;DR

Abstract

Aligning Large Language Models with Recommendation Knowledge

Authors

TL;DR

Abstract

Table of Contents

Figures (3)