Unveiling Inference Scaling for Difference-Aware User Modeling in LLM Personalization
Suyu Chen, Yimeng Bai, Yulong Huang, Xiaoyan Zhao, Yang Zhang
TL;DR
This work tackles the limitation of fixed feature spaces and shallow inference in LLM personalization by introducing Difference-aware Reasoning Personalization (DRP), which employs inference scaling to enable slow, deliberate reasoning over user differences. DRP autonomously discovers relevant feature dimensions, defines them, and validates them through reflective reasoning before generating personalized outputs; it also leverages representative-user comparisons to capture inter-user differences. Empirically, DRP with reasoning-enhanced backbones yields substantial improvements (up to $23.0$% in BLEU on Books) and demonstrates that larger models and reasoning enhance both coverage and granularity of user-difference features, as shown by the UVQ metric and qualitative analyses. Overall, the framework demonstrates that scalable, fine-grained personalization is achievable in a training-free manner through inference scaling and structured difference extraction, with strong implications for personalized content generation in real-world deployments.
Abstract
Large Language Models (LLMs) are increasingly integrated into users' daily lives, driving a growing demand for personalized outputs. Prior work has primarily leveraged a user's own history, often overlooking inter-user differences that are critical for effective personalization. While recent methods have attempted to model such differences, their feature extraction processes typically rely on fixed dimensions and quick, intuitive inference (System-1 thinking), limiting both the coverage and granularity of captured user differences. To address these limitations, we propose Difference-aware Reasoning Personalization (DRP), a framework that reconstructs the difference extraction mechanism by leveraging inference scaling to enhance LLM personalization. DRP autonomously identifies relevant difference feature dimensions and generates structured definitions and descriptions, enabling slow, deliberate reasoning (System-2 thinking) over user differences. Experiments on personalized review generation demonstrate that DRP consistently outperforms baseline methods across multiple metrics.
