ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

Jianghao Lin; Rong Shan; Chenxu Zhu; Kounianhua Du; Bo Chen; Shigang Quan; Ruiming Tang; Yong Yu; Weinan Zhang

ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

Jianghao Lin, Rong Shan, Chenxu Zhu, Kounianhua Du, Bo Chen, Shigang Quan, Ruiming Tang, Yong Yu, Weinan Zhang

TL;DR

This work identifies a lifelong sequential behavior incomprehension problem when applying large language models to recommendation tasks, showing that long user histories challenge LLMs even before reaching their context limits. It introduces ReLLa, a retrieval-enhanced framework that uses semantic user behavior retrieval (SUBR) for zero-shot data quality and retrieval-enhanced instruction tuning (ReiT) for few-shot learning, leveraging a mixed dataset to improve robustness. Across three real-world datasets, ReLLa achieves superior or data-efficient CTR performance, with few-shot setups outperforming full-shot traditional CTR models and substantial gains in long-sequence comprehension. The approach offers a practical path to leveraging LLMs in recommendation scenarios where data is scarce and user histories are lengthy, though it notes higher inference latency compared to traditional methods.

Abstract

With large language models (LLMs) achieving remarkable breakthroughs in natural language processing (NLP) domains, LLM-enhanced recommender systems have received much attention and have been actively explored currently. In this paper, we focus on adapting and empowering a pure large language model for zero-shot and few-shot recommendation tasks. First and foremost, we identify and formulate the lifelong sequential behavior incomprehension problem for LLMs in recommendation domains, i.e., LLMs fail to extract useful information from a textual context of long user behavior sequence, even if the length of context is far from reaching the context limitation of LLMs. To address such an issue and improve the recommendation performance of LLMs, we propose a novel framework, namely Retrieval-enhanced Large Language models (ReLLa) for recommendation tasks in both zero-shot and few-shot settings. For zero-shot recommendation, we perform semantic user behavior retrieval (SUBR) to improve the data quality of testing samples, which greatly reduces the difficulty for LLMs to extract the essential knowledge from user behavior sequences. As for few-shot recommendation, we further design retrieval-enhanced instruction tuning (ReiT) by adopting SUBR as a data augmentation technique for training samples. Specifically, we develop a mixed training dataset consisting of both the original data samples and their retrieval-enhanced counterparts. We conduct extensive experiments on three real-world public datasets to demonstrate the superiority of ReLLa compared with existing baseline models, as well as its capability for lifelong sequential behavior comprehension. To be highlighted, with only less than 10% training samples, few-shot ReLLa can outperform traditional CTR models that are trained on the entire training set (e.g., DCNv2, DIN, SIM). The code is available \url{https://github.com/LaVieEnRose365/ReLLa}.

ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

TL;DR

Abstract

Paper Structure (38 sections, 3 equations, 11 figures, 9 tables)

This paper contains 38 sections, 3 equations, 11 figures, 9 tables.

Introduction
Preliminaries
Zero-shot and Few-shot Recommendations
Textual Input-Output Pair Formulation
Pointwise Scoring with LLMs
Methodology
Overview of ReLLa
Semantic User Behavior Retrieval
Retrieval-enhanced Instruction Tuning
Experiment
Experiment Setup
Datasets
Evaluation Metrics
Baseline Models
Implementation Details
...and 23 more sections

Figures (11)

Figure 1: The illustration of lifelong sequential behavior incomprehension problem for LLMs. We report the AUC performance of SIM and Vicuna-13B on MovieLens-1M dataset. While SIM enjoys steady performance improvement as the length of behavior sequence $K$ grows, Vicuna-13B only peaks at $K=15$ and fails to extract the useful information with further longer sequences (i.e., $K>15$).
Figure 2: Illustration of textual input-output pair.
Figure 3: Illustration of semantic user behavior retrieval (SUBR), which improves the data quality by retrieving the top-$K$ semantically relevant behaviors towards the target item. This reduces the difficulty for LLMs to extract useful information from the user history, and therefore alleviates the long user behavior sequence incomprehension problem.
Figure 4: Illustration of descriptive text for an item (movie).
Figure 5: Illustration of retrieval-enhanced instruction tuning, where we construct a mixed training dataset. The mixed dataset consists of both the original textual input-output samples and their retrieval-enhanced counterparts obtained via semantic user behavior retrieval (SUBR).
...and 6 more figures

ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

TL;DR

Abstract

ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)