Table of Contents
Fetching ...

Towards Proactive Personalization through Profile Customization for Individual Users in Dialogues

Xiaotian Zhang, Yuan Wang, Ruizhe Chen, Zeya Wang, Runchen Hou, Zuozhu Liu

TL;DR

The paper addresses the challenge of long-term, user-specific personalization in dialog systems by introducing PersonalAgent, a memory-enabled agent that incrementally infers and stores user preferences as a unified profile across sessions. It formalizes turn-level preference inference as a multi-turn MDP and trains the agent with Group Relative Policy Optimization, using a policy-based judge to provide robust feedback. The approach is validated on ALOE, PrefEval, and a new ALOE-Unseen dataset for cold-start scenarios, showing superior accuracy, proactive querying, and strong cross-session consistency, with human annotations corroborating the evaluation signals. The work emphasizes memory-based personalization as a pathway to more natural, inclusive, and adaptive conversational agents and outlines directions for broader, longer-horizon evaluations.

Abstract

The deployment of Large Language Models (LLMs) in interactive systems necessitates a deep alignment with the nuanced and dynamic preferences of individual users. Current alignment techniques predominantly address universal human values or static, single-turn preferences, thereby failing to address the critical needs of long-term personalization and the initial user cold-start problem. To bridge this gap, we propose PersonalAgent, a novel user-centric lifelong agent designed to continuously infer and adapt to user preferences. PersonalAgent constructs and dynamically refines a unified user profile by decomposing dialogues into single-turn interactions, framing preference inference as a sequential decision-making task. Experiments show that PersonalAgent achieves superior performance over strong prompt-based and policy optimization baselines, not only in idealized but also in noisy conversational contexts, while preserving cross-session preference consistency. Furthermore, human evaluation confirms that PersonalAgent excels at capturing user preferences naturally and coherently. Our findings underscore the importance of lifelong personalization for developing more inclusive and adaptive conversational agents. Our code is available here.

Towards Proactive Personalization through Profile Customization for Individual Users in Dialogues

TL;DR

The paper addresses the challenge of long-term, user-specific personalization in dialog systems by introducing PersonalAgent, a memory-enabled agent that incrementally infers and stores user preferences as a unified profile across sessions. It formalizes turn-level preference inference as a multi-turn MDP and trains the agent with Group Relative Policy Optimization, using a policy-based judge to provide robust feedback. The approach is validated on ALOE, PrefEval, and a new ALOE-Unseen dataset for cold-start scenarios, showing superior accuracy, proactive querying, and strong cross-session consistency, with human annotations corroborating the evaluation signals. The work emphasizes memory-based personalization as a pathway to more natural, inclusive, and adaptive conversational agents and outlines directions for broader, longer-horizon evaluations.

Abstract

The deployment of Large Language Models (LLMs) in interactive systems necessitates a deep alignment with the nuanced and dynamic preferences of individual users. Current alignment techniques predominantly address universal human values or static, single-turn preferences, thereby failing to address the critical needs of long-term personalization and the initial user cold-start problem. To bridge this gap, we propose PersonalAgent, a novel user-centric lifelong agent designed to continuously infer and adapt to user preferences. PersonalAgent constructs and dynamically refines a unified user profile by decomposing dialogues into single-turn interactions, framing preference inference as a sequential decision-making task. Experiments show that PersonalAgent achieves superior performance over strong prompt-based and policy optimization baselines, not only in idealized but also in noisy conversational contexts, while preserving cross-session preference consistency. Furthermore, human evaluation confirms that PersonalAgent excels at capturing user preferences naturally and coherently. Our findings underscore the importance of lifelong personalization for developing more inclusive and adaptive conversational agents. Our code is available here.

Paper Structure

This paper contains 25 sections, 12 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: PersonalAgent is inspired by the way humans communicate with others. Rather than feeding the entire conversation history $\mathcal{H}$ as input, it learns multi-turn dialogues $\bm{c}$ turn by turn and processes them iteratively, recording relevant information in a user profile $\mathcal{P}$. Finally, the agent leverages the profile $\mathcal{P}$ stored across sessions to determine whether further querying is needed before generating a response $\bm{r}$ for the user request $\bm{q}$.
  • Figure 2: We define a total of eleven major categories that cover diverse dimensions of user preferences, aiming to comprehensively record and customize each user’s personalized profile. The specific categories are listed in Figure \ref{['fig:profile template']}.
  • Figure 3: Alignment Level comparison with the baseline on ALOE dataset, we report the average AL score (%).
  • Figure 4: Comparison of models trained with different reward designs. Experiments are conducted on the PrefEval, ALOE, and ALOE-Unseen benchmarks, and results are reported in terms of accuracy (%).
  • Figure 5: Comparison of the long-term alignment of PersonalAgent and baselines on the PrefEval and ALOE datasets, where irrelevant dialogue turns are inserted following the user preference dialogue.
  • ...and 6 more figures