Table of Contents
Fetching ...

PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra

Xiachong Feng, Liang Zhao, Weihong Zhong, Yichong Huang, Yuxuan Gu, Lingpeng Kong, Xiaocheng Feng, Bing Qin

TL;DR

PERSONA is introduced, a training-free framework that achieves fine-tuning level performance through direct manipulation of personality vectors in activation space, providing evidence that aspects of LLM personality are mathematically tractable.

Abstract

Current methods for personality control in Large Language Models rely on static prompting or expensive fine-tuning, failing to capture the dynamic and compositional nature of human traits. We introduce PERSONA, a training-free framework that achieves fine-tuning level performance through direct manipulation of personality vectors in activation space. Our key insight is that personality traits appear as extractable, approximately orthogonal directions in the model's representation space that support algebraic operations. The framework operates through three stages: Persona-Base extracts orthogonal trait vectors via contrastive activation analysis; Persona-Algebra enables precise control through vector arithmetic (scalar multiplication for intensity, addition for composition, subtraction for suppression); and Persona-Flow achieves context-aware adaptation by dynamically composing these vectors during inference. On PersonalityBench, our approach achieves a mean score of 9.60, nearly matching the supervised fine-tuning upper bound of 9.61 without any gradient updates. On our proposed Persona-Evolve benchmark for dynamic personality adaptation, we achieve up to 91% win rates across diverse model families. These results provide evidence that aspects of LLM personality are mathematically tractable, opening new directions for interpretable and efficient behavioral control.

PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra

TL;DR

PERSONA is introduced, a training-free framework that achieves fine-tuning level performance through direct manipulation of personality vectors in activation space, providing evidence that aspects of LLM personality are mathematically tractable.

Abstract

Current methods for personality control in Large Language Models rely on static prompting or expensive fine-tuning, failing to capture the dynamic and compositional nature of human traits. We introduce PERSONA, a training-free framework that achieves fine-tuning level performance through direct manipulation of personality vectors in activation space. Our key insight is that personality traits appear as extractable, approximately orthogonal directions in the model's representation space that support algebraic operations. The framework operates through three stages: Persona-Base extracts orthogonal trait vectors via contrastive activation analysis; Persona-Algebra enables precise control through vector arithmetic (scalar multiplication for intensity, addition for composition, subtraction for suppression); and Persona-Flow achieves context-aware adaptation by dynamically composing these vectors during inference. On PersonalityBench, our approach achieves a mean score of 9.60, nearly matching the supervised fine-tuning upper bound of 9.61 without any gradient updates. On our proposed Persona-Evolve benchmark for dynamic personality adaptation, we achieve up to 91% win rates across diverse model families. These results provide evidence that aspects of LLM personality are mathematically tractable, opening new directions for interpretable and efficient behavioral control.
Paper Structure (70 sections, 1 equation, 15 figures, 23 tables)

This paper contains 70 sections, 1 equation, 15 figures, 23 tables.

Figures (15)

  • Figure 1: The PERSONA framework.
  • Figure 2: Cosine similarity between persona vectors.
  • Figure 3: Linear relationship between steering coefficients and BFI dimension scores. All vectors except dependable show strong linear modulation (high R²). We conjecture the dependable vector's saturation stems from baseline model optimization for conscientiousness.
  • Figure 4: BFI-44 score changes after vector arithmetic operations. Y-axis shows operation and target dimension with expected direction (arrows). Grey: baseline scores; colored: post-steering scores (green for expected increases, red for decreases).
  • Figure 5: An example of Persona-Evolve, together with the comparison between vanilla answer and answer steered by Persona-Flow. By reflecting on the scenario and then suppressing conscientiousness accordingly, Persona-Flow can produce a more natural and contextually appropriate response that align better with the anticipated feeling of being overwhelmed.
  • ...and 10 more figures