Table of Contents
Fetching ...

Reason4Rec: Large Language Models for Recommendation with Deliberative User Preference Alignment

Yi Fang, Wenjie Wang, Yang Zhang, Fengbin Zhu, Qifan Wang, Fuli Feng, Xiangnan He

TL;DR

This work tackles the limitations of RecLLMs arising from optimizing only for direct user-feedback prediction, which can hinder reliability in complex scenarios. It introduces Deliberative Recommendation and the Reason4Rec framework, which decomposes reasoning into three collaborative steps—Summarizer, Reasoner, and Predictor—guided by verbalized user feedback (reviews) and implemented via three QLoRA adapters. Across three real-world datasets, Reason4Rec demonstrates improved rating prediction accuracy (lower MAE/RMSE) and higher reasoning quality (BLEURT, GPTScore) compared with traditional, review-based, and other LLM-based baselines, validating the value of slow, multi-step deliberation. The approach advances interpretability and reliability in RecLLMs and points to future work on richer verbalized feedback, efficiency optimizations, and interactive human-AI learning settings.

Abstract

While recent advancements in aligning Large Language Models (LLMs) with recommendation tasks have shown great potential and promising performance overall, these aligned recommendation LLMs still face challenges in complex scenarios. This is primarily due to the current alignment approach focusing on optimizing LLMs to generate user feedback directly, without incorporating deliberation. To overcome this limitation and develop more reliable LLMs for recommendations, we propose a new Deliberative Recommendation task, which incorporates explicit reasoning about user preferences as an additional alignment goal. We then introduce the Reasoning-powered Recommender framework for deliberative user preference alignment, designed to enhance reasoning capabilities by utilizing verbalized user feedback in a step-wise manner to tackle this task. The framework employs collaborative step-wise experts and tailored training strategies for each expert. Experimental results across three real-world datasets demonstrate the rationality of the deliberative task formulation and the superior performance of the proposed framework in improving both prediction accuracy and reasoning quality.

Reason4Rec: Large Language Models for Recommendation with Deliberative User Preference Alignment

TL;DR

This work tackles the limitations of RecLLMs arising from optimizing only for direct user-feedback prediction, which can hinder reliability in complex scenarios. It introduces Deliberative Recommendation and the Reason4Rec framework, which decomposes reasoning into three collaborative steps—Summarizer, Reasoner, and Predictor—guided by verbalized user feedback (reviews) and implemented via three QLoRA adapters. Across three real-world datasets, Reason4Rec demonstrates improved rating prediction accuracy (lower MAE/RMSE) and higher reasoning quality (BLEURT, GPTScore) compared with traditional, review-based, and other LLM-based baselines, validating the value of slow, multi-step deliberation. The approach advances interpretability and reliability in RecLLMs and points to future work on richer verbalized feedback, efficiency optimizations, and interactive human-AI learning settings.

Abstract

While recent advancements in aligning Large Language Models (LLMs) with recommendation tasks have shown great potential and promising performance overall, these aligned recommendation LLMs still face challenges in complex scenarios. This is primarily due to the current alignment approach focusing on optimizing LLMs to generate user feedback directly, without incorporating deliberation. To overcome this limitation and develop more reliable LLMs for recommendations, we propose a new Deliberative Recommendation task, which incorporates explicit reasoning about user preferences as an additional alignment goal. We then introduce the Reasoning-powered Recommender framework for deliberative user preference alignment, designed to enhance reasoning capabilities by utilizing verbalized user feedback in a step-wise manner to tackle this task. The framework employs collaborative step-wise experts and tailored training strategies for each expert. Experimental results across three real-world datasets demonstrate the rationality of the deliberative task formulation and the superior performance of the proposed framework in improving both prediction accuracy and reasoning quality.

Paper Structure

This paper contains 25 sections, 3 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison between the alignment objective of existing research, which optimizes LLMs to directly predict user feedback; and the objective of Deliberative Recommendation, which optimizes LLMs to conduct explicit reasoning about user preferences before generating the prediction.
  • Figure 2: Illustration of the Reasoning-powered Recommender framework.
  • Figure 3: Necessity of each Step. An ablation study on each step in Reason4Rec.
  • Figure 4: One-Step v.s. Multi-Step. Performance comparison between Reason4Rec's multi-step reasoning strategy and two alternative one-step reasoning strategies.
  • Figure 5: Case study on whether the reasons generated by our Reason4Rec and the baselines align with user preferences. The "User Review" includes the ground-truth user preferences.