Aligning Large Language Models with Recommendation Knowledge
Yuwei Cao, Nikhil Mehta, Xinyang Yi, Raghunandan Keshavan, Lukasz Heldt, Lichan Hong, Ed H. Chi, Maheswaran Sathiamoorthy
TL;DR
The paper addresses the gap between large language models (LLMs) and the domain-specific knowledge required for effective recommendations. It introduces auxiliary-task data generation by converting classical recommender signals (Masked Item Modeling, Masked Language Modeling, and Bayesian Personalized Ranking) into natural-language prompts, and pairs them with more informative recommendation-task prompts that omit user IDs while including item titles. A simple multi-task fine-tuning framework is used to align LLM backbones (FLAN-T5-XL/Base) with three real-world domains (Amazon Toys & Games, Beauty, Sports & Outdoors), yielding substantial retrieval improvements and competitive ranking and rating-prediction performance, often surpassing state-of-the-art baselines. The work demonstrates the practical potential of injecting recommendation-specific knowledge into LLMs, while noting computational costs and the need for further refinement of data prompts and content representations.
Abstract
Large language models (LLMs) have recently been used as backbones for recommender systems. However, their performance often lags behind conventional methods in standard tasks like retrieval. We attribute this to a mismatch between LLMs' knowledge and the knowledge crucial for effective recommendations. While LLMs excel at natural language reasoning, they cannot model complex user-item interactions inherent in recommendation tasks. We propose bridging the knowledge gap and equipping LLMs with recommendation-specific knowledge to address this. Operations such as Masked Item Modeling (MIM) and Bayesian Personalized Ranking (BPR) have found success in conventional recommender systems. Inspired by this, we simulate these operations through natural language to generate auxiliary-task data samples that encode item correlations and user preferences. Fine-tuning LLMs on such auxiliary-task data samples and incorporating more informative recommendation-task data samples facilitates the injection of recommendation-specific knowledge into LLMs. Extensive experiments across retrieval, ranking, and rating prediction tasks on LLMs such as FLAN-T5-Base and FLAN-T5-XL show the effectiveness of our technique in domains such as Amazon Toys & Games, Beauty, and Sports & Outdoors. Notably, our method outperforms conventional and LLM-based baselines, including the current SOTA, by significant margins in retrieval, showcasing its potential for enhancing recommendation quality.
