Reasoning Meets Personalization: Unleashing the Potential of Large Reasoning Model for Personalized Generation
Sichun Luo, Guanzhi Deng, Jian Xu, Xiaojie Zhang, Hanxu Hou, Linqi Song
TL;DR
This paper conducts the first systematic evaluation of Large Reasoning Models (LRMs) for personalization using the LaMP benchmark, revealing that LRMs often lag behind general-purpose LLMs in retrieval-heavy settings. It identifies three core limitations—limited divergent thinking, poor response-format alignment, and inefficient use of retrieved knowledge—that hinder personalization performance. To address these, the authors introduce Reinforced Reasoning for Personalization (R2P), comprising a Hierarchical Reasoning Thought Template (HRT), a Reasoning Process Intervention (RPI), and a Self-Referencing Module (SRM). Through extensive experiments, R2P achieves superior performance across LaMP tasks, especially with more contextual information, demonstrating the value of structured reasoning, alignment interventions, and consistency checks for personalized generation.
Abstract
Personalization is a critical task in modern intelligent systems, with applications spanning diverse domains, including interactions with large language models (LLMs). Recent advances in reasoning capabilities have significantly enhanced LLMs, enabling unprecedented performance in tasks such as mathematics and coding. However, their potential for personalization tasks remains underexplored. In this paper, we present the first systematic evaluation of large reasoning models (LRMs) for personalization tasks. Surprisingly, despite generating more tokens, LRMs do not consistently outperform general-purpose LLMs, especially in retrieval-intensive scenarios where their advantages diminish. Our analysis identifies three key limitations: divergent thinking, misalignment of response formats, and ineffective use of retrieved information. To address these challenges, we propose Reinforced Reasoning for Personalization (\model), a novel framework that incorporates a hierarchical reasoning thought template to guide LRMs in generating structured outputs. Additionally, we introduce a reasoning process intervention method to enforce adherence to designed reasoning patterns, enhancing alignment. We also propose a cross-referencing mechanism to ensure consistency. Extensive experiments demonstrate that our approach significantly outperforms existing techniques.
