Table of Contents
Fetching ...

Synthetic Interaction Data for Scalable Personalization in Large Language Models

Yuchen Ma, Yue Huang, Wenjie Wang, Xiaonan Luo, Xiangliang Zhang, Stefan Feuerriegel

TL;DR

This work tackles the data and methodological bottlenecks in personalized LLM alignment by introducing PersonaGym, a high-fidelity synthetic data generator that models users as dynamic latent preferences through an agentic multi-LLM setup, producing large-scale PersonaAtlas trajectories. It then presents PPOpt, a model-agnostic prompt optimization framework that infers user profiles from interaction history and rewrites prompts to maximize downstream task performance while respecting personalization constraints via a reason-then-optimize RL objective. The approach yields improvements in personalization quality with minimal degradation in task success and demonstrates robustness across multiple base LLMs, supported by extensive synthetic and real-world evaluations and human judgments. Collectively, the work provides scalable, privacy-friendly pathways for per-user customization without altering deployed models, with broad implications for personalized AI assistants and enterprise LLM deployments.

Abstract

Personalized prompting offers large opportunities for deploying large language models (LLMs) to diverse users, yet existing prompt optimization methods primarily focus on task-level optimization while largely overlooking user-specific preferences and latent constraints of individual users. This gap is primarily due to (i) the absence of high-quality, privacy-sensitive data that capture personalized user-LLM interactions at scale, and (ii) the lack of robust reward signals for individual preferences. To overcome existing data limitations, we introduce a high-fidelity synthetic data generation framework called PersonaGym. Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process via an agentic LLM system to simulate realistic preference behaviors and semantic-aware noise in order to generate personalized multi-turn interaction trajectories. Using PersonaGym, we release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories that closely mirror real-world preference expression and noise patterns. We further propose Personalized Prompt Optimization (PPOpt), a scalable and model-agnostic framework that optimizes user prompts based on interaction histories without modifying the deployed LLM. PPOpt adopts a reason-then-optimize paradigm that infers an explicit user profile and conditions prompt rewriting on the user profile to avoid reward hacking. Our training procedure for PPOpt integrates a cold-start supervised prior with outcome-driven multi-objective reinforcement learning. We present extensive experiments to demonstrate consistent improvements over state-of-the-art baselines in terms of task performance, personalization quality, and robustness to noisy as well as to sparse preference signals.

Synthetic Interaction Data for Scalable Personalization in Large Language Models

TL;DR

This work tackles the data and methodological bottlenecks in personalized LLM alignment by introducing PersonaGym, a high-fidelity synthetic data generator that models users as dynamic latent preferences through an agentic multi-LLM setup, producing large-scale PersonaAtlas trajectories. It then presents PPOpt, a model-agnostic prompt optimization framework that infers user profiles from interaction history and rewrites prompts to maximize downstream task performance while respecting personalization constraints via a reason-then-optimize RL objective. The approach yields improvements in personalization quality with minimal degradation in task success and demonstrates robustness across multiple base LLMs, supported by extensive synthetic and real-world evaluations and human judgments. Collectively, the work provides scalable, privacy-friendly pathways for per-user customization without altering deployed models, with broad implications for personalized AI assistants and enterprise LLM deployments.

Abstract

Personalized prompting offers large opportunities for deploying large language models (LLMs) to diverse users, yet existing prompt optimization methods primarily focus on task-level optimization while largely overlooking user-specific preferences and latent constraints of individual users. This gap is primarily due to (i) the absence of high-quality, privacy-sensitive data that capture personalized user-LLM interactions at scale, and (ii) the lack of robust reward signals for individual preferences. To overcome existing data limitations, we introduce a high-fidelity synthetic data generation framework called PersonaGym. Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process via an agentic LLM system to simulate realistic preference behaviors and semantic-aware noise in order to generate personalized multi-turn interaction trajectories. Using PersonaGym, we release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories that closely mirror real-world preference expression and noise patterns. We further propose Personalized Prompt Optimization (PPOpt), a scalable and model-agnostic framework that optimizes user prompts based on interaction histories without modifying the deployed LLM. PPOpt adopts a reason-then-optimize paradigm that infers an explicit user profile and conditions prompt rewriting on the user profile to avoid reward hacking. Our training procedure for PPOpt integrates a cold-start supervised prior with outcome-driven multi-objective reinforcement learning. We present extensive experiments to demonstrate consistent improvements over state-of-the-art baselines in terms of task performance, personalization quality, and robustness to noisy as well as to sparse preference signals.
Paper Structure (62 sections, 17 equations, 13 figures, 9 tables, 1 algorithm)

This paper contains 62 sections, 17 equations, 13 figures, 9 tables, 1 algorithm.

Figures (13)

  • Figure 1: Overview of our high-fidelity synthetic data generation framework PersonaGym (left) and the unified prompt optimization framework PPOpt (right).
  • Figure 2: Conversation embedding of PersonaAtlas by different domains.
  • Figure 3: Ablation study of PPOpt. We report personalization scores under different training settings: SFT only, RL w/o the profile inference reward, and our full setting.
  • Figure 4: Examples of distractor perturbations at three levels. All examples share the same latent user intent, while the observed queries exhibit increasing degrees of syntactic noise, missing execution-relevant constraints, or semantic ambiguity.
  • Figure 5: Reasoning and reward design for personalized prompt optimization. Unconstrained or absent reasoning leads to misalignment or reward hacking, whereas combining outcome-driven rewards with constrained reasoning causality yields robust personalization.
  • ...and 8 more figures