Table of Contents
Fetching ...

Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM

Zizhao Hu, Mohammad Rostami, Jesse Thomason

Abstract

Persona prompting can steer LLM generation towards a domain-specific tone and pattern. This behavior enables use cases in multi-agent systems where diverse interactions are crucial and human-centered tasks require high-level human alignment. Prior works provide mixed opinions on their utility: some report performance gains when using expert personas for certain domains and their contribution to data diversity in synthetic data creation, while others find near-zero or negative impact on general utility. To fully leverage the benefits of the LLM persona and avoid its harmfulness, a more comprehensive investigation of the mechanism is crucial. In this work, we study how model optimization, task type, prompt length, and placement can impact expert persona effectiveness across instruction-tuned and reasoning LLMs, and provide insight into conditions under which expert personas fail and succeed. Based on our findings, we developed a pipeline to fully leverage the benefits of an expert persona, named PRISM (Persona Routing via Intent-based Self-Modeling), which self-distills an intent-conditioned expert persona into a gated LoRA adapter through a bootstrapping process that requires no external data, models, or knowledge. PRISM enhances human preference and safety alignment on generative tasks while maintaining accuracy on discriminative tasks across all models, with minimal memory and computing overhead.

Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM

Abstract

Persona prompting can steer LLM generation towards a domain-specific tone and pattern. This behavior enables use cases in multi-agent systems where diverse interactions are crucial and human-centered tasks require high-level human alignment. Prior works provide mixed opinions on their utility: some report performance gains when using expert personas for certain domains and their contribution to data diversity in synthetic data creation, while others find near-zero or negative impact on general utility. To fully leverage the benefits of the LLM persona and avoid its harmfulness, a more comprehensive investigation of the mechanism is crucial. In this work, we study how model optimization, task type, prompt length, and placement can impact expert persona effectiveness across instruction-tuned and reasoning LLMs, and provide insight into conditions under which expert personas fail and succeed. Based on our findings, we developed a pipeline to fully leverage the benefits of an expert persona, named PRISM (Persona Routing via Intent-based Self-Modeling), which self-distills an intent-conditioned expert persona into a gated LoRA adapter through a bootstrapping process that requires no external data, models, or knowledge. PRISM enhances human preference and safety alignment on generative tasks while maintaining accuracy on discriminative tasks across all models, with minimal memory and computing overhead.
Paper Structure (70 sections, 9 equations, 4 figures, 16 tables)

This paper contains 70 sections, 9 equations, 4 figures, 16 tables.

Figures (4)

  • Figure 1: Expert persona impact across models, tasks, granularity, and placement.(a) On MT-Bench, long expert personas help in 5/8 categories (Writing, Roleplay, Reasoning, Extraction, STEM), with the strongest gains in Extraction (+0.65) and STEM (+0.60). (b) On MMLU, all expert persona variants damage accuracy, with the minimum persona suffering the least (overall: 68.0% vs. 71.6% baseline). (c) A dedicated "Safety Monitor" expert persona boosts attack refusal rates across all benchmarks, with the long persona achieving the largest gain on JailbreakBench (+17.7%). (d) Cross-model expert persona impact is model, placement, and task-dependent.
  • Figure 2: Panels (a--c): Instruction-tuned model (Qwen2.5-7B-Instruct). Panels (d--f): Reasoning-distilled models (average of 2 R1 variants). (a,d) Per-category score lift of each persona over the no-persona baseline on MT-Bench: Writing (Wr), Roleplay (Ro), Reasoning (Re), Math (Ma), Coding (Co), Extraction (Ex), STEM (St), Humanities (Hu). Diagonal = expert persona; blue = gain; red = loss. (b,e) Each expert persona's effect across all tasks; the zero line represents the base model. In (b), most expert personas fall below zero, showing that an expert persona generally damages overall performance for instruction-tuned models. In (e), the pattern reverses: expert personas improve overall performance for reasoning models, driven by three categories (Re, Co, St) that dominate the R1 distillation training set, confirming that model optimization directly determines whether persona can provide improvement. (c, f) Expert persona's utility on its matching domain compared to a random persona. Near-flat bars in (f) indicate gains are context-driven rather than expertise-specific.
  • Figure 3: Top row: Two simple approaches to automate expert persona selection. Approach 1 (left): a router selects the appropriate persona prompt per query at inference time---however, this is expensive and the expert persona might not always improve performance. Approach 2 (right): supervised finetuning on domain expert data bakes persona behavior directly into model weights---however, expert persona training data is hard to collect and base model performance is damaged. Bottom row: The five-stage PRISM training pipeline, which addresses both limitations: (1) Query Generation conditioned on persona prompts, (2) Answer with Persona generating multi-persona responses, (3) Self-Verification for distillation set selection via pairwise comparison, (4) Router/Gate Training to learn intent-based routing that decides when persona activation helps, and (5) Self-Distillation via LoRA to internalize persona behaviors.
  • Figure 4: % routed to LoRA vs. expert persona effect across 15 categories. MMLU (low), safety (high), MT-Bench (mixed). Correlation: $r{=}0.65$, $\rho{=}0.75$.