PersonaMem-v2: Towards Personalized Intelligence via Learning Implicit User Personas and Agentic Memory
Bowen Jiang, Yuan Yuan, Maohao Shen, Zhuoqun Hao, Zhangchen Xu, Zichen Chen, Ziyi Liu, Anvesh Rao Vijjini, Jiashu He, Hanchao Yu, Radha Poovendran, Gregory Wornell, Lyle Ungar, Dan Roth, Sihao Chen, Camillo Jose Taylor
TL;DR
The paper addresses the challenge of inferring implicit user personas in long, noisy interactions and delivering personalized, context-aware responses. It introduces PersonaMem-v2, a large-scale dataset with 1,000 implicit personas, 20k+ preferences, and 128k-token contexts, plus robust multi-session histories and privacy-aware design, enabling reinforcement learning and agentic memory experiments. Through GRPO-based reinforcement fine-tuning, a 4B reasoning model surpasses GPT-5 on implicit personalization, and an agentic memory framework compresses histories into a 2k-token memory to achieve state-of-the-art performance with 16x efficiency gains, while maintaining memory transparency. Together, these contributions point to a scalable path toward real-world personalized intelligence with interpretable memory and stronger alignment to individual user needs.
Abstract
Personalization is one of the next milestones in advancing AI capability and alignment. We introduce PersonaMem-v2, the state-of-the-art dataset for LLM personalization that simulates 1,000 realistic user-chatbot interactions on 300+ scenarios, 20,000+ user preferences, and 128k-token context windows, where most user preferences are implicitly revealed to reflect real-world interactions. Using this data, we investigate how reinforcement fine-tuning enables a model to improve its long-context reasoning capabilities for user understanding and personalization. We also develop a framework for training an agentic memory system, which maintains a single, human-readable memory that grows with each user over time. In our experiments, frontier LLMs still struggle with implicit personalization, achieving only 37-48% accuracy. While they support long context windows, reasoning remains the bottleneck for implicit personalization tasks. Using reinforcement fine-tuning, we successfully train Qwen3-4B to outperforms GPT-5, reaching 53% accuracy in implicit personalization. Moreover, our agentic memory framework achieves state-of-the-art 55% accuracy while using 16x fewer input tokens, relying on a 2k-token memory instead of full 32k conversation histories. These results underscore the impact of our dataset and demonstrate agentic memory as a scalable path toward real-world personalized intelligence.
