Language-Based User Profiles for Recommendation

Joyce Zhou; Yijia Dai; Thorsten Joachims

Language-Based User Profiles for Recommendation

Joyce Zhou, Yijia Dai, Thorsten Joachims

TL;DR

The Language-based Factorization Model is proposed, which is essentially an encoder/decoder model where both the encoder and the decoder are large language models (LLMs) and generating a compact and human-readable summary often performs comparably with or better than direct LLM prediction, while enjoying better interpretability and shorter model input length.

Abstract

Most conventional recommendation methods (e.g., matrix factorization) represent user profiles as high-dimensional vectors. Unfortunately, these vectors lack interpretability and steerability, and often perform poorly in cold-start settings. To address these shortcomings, we explore the use of user profiles that are represented as human-readable text. We propose the Language-based Factorization Model (LFM), which is essentially an encoder/decoder model where both the encoder and the decoder are large language models (LLMs). The encoder LLM generates a compact natural-language profile of the user's interests from the user's rating history. The decoder LLM uses this summary profile to complete predictive downstream tasks. We evaluate our LFM approach on the MovieLens dataset, comparing it against matrix factorization and an LLM model that directly predicts from the user's rating history. In cold-start settings, we find that our method can have higher accuracy than matrix factorization. Furthermore, we find that generating a compact and human-readable summary often performs comparably with or better than direct LLM prediction, while enjoying better interpretability and shorter model input length. Our results motivate a number of future research directions and potential improvements.

Language-Based User Profiles for Recommendation

TL;DR

Abstract

Paper Structure (15 sections, 21 figures, 1 table)

This paper contains 15 sections, 21 figures, 1 table.

Introduction
Related Work
Methods
Dataset and Experiment Setup
Experiment Results
How often do the methods fail to make a prediction?
How does LFM compare against other methods?
How does LFM accuracy vary with the profile size?
How does LFM compare against NMF with varying amounts of background data?
Discussion and Directions
Runtime
Task Prediction Extraction
Prompts and Hyperparameters
Example Summaries
More Experiment Results

Figures (21)

Figure 1: Summary of how our user representation method works and what tasks we tested it on
Figure 2: Fraction of readable predictions for all tasks with different methods and models vs history size.
Figure 3: Performance (RMSE, MAE, and error rate) for all tasks with different methods (using Llama 2 13B) vs history size.
Figure 4: Bias (mean error) of rating prediction task with different methods and models vs history sizes.
Figure 5: Performance metrics (RMSE, MAE, and error rate) for all tasks with different LFM summary lengths with history size 30.
...and 16 more figures

Language-Based User Profiles for Recommendation

TL;DR

Abstract

Language-Based User Profiles for Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (21)