Table of Contents
Fetching ...

End-to-end Training for Recommendation with Language-based User Profiles

Zhaolin Gao, Joyce Zhou, Yijia Dai, Thorsten Joachims

TL;DR

LangPTune addresses transparency limitations in embedding-based recommender systems by end-to-end training of language-based user profiles. It couples a Profile Encoder with a Recommender Decoder and optimizes the encoder via Reinforcement Learning for System Optimization (RLSO) while aligning the decoder through Contrastive Learning (CL), achieving end-to-end optimization. Across Amazon-Movie-TV and Amazon-Books and using Gemma and Llama models, LangPTune outperforms zero-shot language-based baselines and rivals state-of-the-art embedding-based methods, with interpretability validated by GPT-4 and crowdworker studies. The results underscore the practicality of interpretable, steerable recommendations with competitive accuracy, and the work provides open-source code for replication and extension.

Abstract

There is a growing interest in natural language-based user profiles for recommender systems, which aims to enhance transparency and scrutability compared with embedding-based methods. Existing studies primarily generate these profiles using zero-shot inference from large language models (LLMs), but their quality remains insufficient, leading to suboptimal recommendation performance. In this paper, we introduce LangPTune, the first end-to-end training framework to optimize LLM-generated user profiles. Our method significantly outperforms zero-shot approaches by explicitly training the LLM for the recommendation objective. Through extensive evaluations across diverse training configurations and benchmarks, we demonstrate that LangPTune not only surpasses zero-shot baselines but can also matches the performance of state-of-the-art embedding-based methods. Finally, we investigate whether the training procedure preserves the interpretability of these profiles compared to zero-shot inference through both GPT-4 simulations and crowdworker user studies. Implementation of LangPTune can be found at https://github.com/ZhaolinGao/LangPTune.

End-to-end Training for Recommendation with Language-based User Profiles

TL;DR

LangPTune addresses transparency limitations in embedding-based recommender systems by end-to-end training of language-based user profiles. It couples a Profile Encoder with a Recommender Decoder and optimizes the encoder via Reinforcement Learning for System Optimization (RLSO) while aligning the decoder through Contrastive Learning (CL), achieving end-to-end optimization. Across Amazon-Movie-TV and Amazon-Books and using Gemma and Llama models, LangPTune outperforms zero-shot language-based baselines and rivals state-of-the-art embedding-based methods, with interpretability validated by GPT-4 and crowdworker studies. The results underscore the practicality of interpretable, steerable recommendations with competitive accuracy, and the work provides open-source code for replication and extension.

Abstract

There is a growing interest in natural language-based user profiles for recommender systems, which aims to enhance transparency and scrutability compared with embedding-based methods. Existing studies primarily generate these profiles using zero-shot inference from large language models (LLMs), but their quality remains insufficient, leading to suboptimal recommendation performance. In this paper, we introduce LangPTune, the first end-to-end training framework to optimize LLM-generated user profiles. Our method significantly outperforms zero-shot approaches by explicitly training the LLM for the recommendation objective. Through extensive evaluations across diverse training configurations and benchmarks, we demonstrate that LangPTune not only surpasses zero-shot baselines but can also matches the performance of state-of-the-art embedding-based methods. Finally, we investigate whether the training procedure preserves the interpretability of these profiles compared to zero-shot inference through both GPT-4 simulations and crowdworker user studies. Implementation of LangPTune can be found at https://github.com/ZhaolinGao/LangPTune.

Paper Structure

This paper contains 35 sections, 12 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: Reinforcement Learning for System Optimization Pipeline
  • Figure 2: Example of the first three points from the profiles generated by Llama-3-8B-it before and after trained by LangPTune. The highlighted text represents the additional details generated after training.
  • Figure 3: Ablation of Train Data Ratio on Amazon-Movie-TV.
  • Figure 4: Visualizations with Llama-3-8B-it on Amazon-Movie-TV dataset. (Left) Visualization of the embeddings of profiles from LangPTune-0 and LangPTune-RLSO using Mxbai and t-SNE. (Middle) Length vs. Similarity to the ground-truth item for the profiles from NoProfile-0, LangPTune-0, and LangPTune-RLSO. Red arrow indicates the optimal direction. (Right) Reward for LangPTune-RLSO during training.