Table of Contents
Fetching ...

The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models

Zhixiang Wang

TL;DR

This work addresses the instability introduced by weight-based personalization of large language models, proposing a geometric framework (the Soul Engine) built on the Linear Representation Hypothesis that personality resides in orthogonal latent subspaces. By developing SoulBench and a dual-head, stratified-freezing encoder, the authors demonstrate deterministic, zero-shot personality steering via latent vector arithmetic without degrading core reasoning. Psychometric precision (MSE ≈ 0.0113) and qualitative geometry analyses (T-SNE) support the separability and continuity of personality manifolds, while ablations identify a mid-network sweet spot (layers 14-16) for intervention. The approach offers a safe, non-destructive alternative to fine-tuning that could enable robust, controllable personalization and motivates future scaling and safety mechanisms around latent interventions.

Abstract

Background: The deployment of personalized Large Language Models (LLMs) is currently constrained by the stability-plasticity dilemma. Prevailing alignment methods, such as Supervised Fine-Tuning (SFT), rely on stochastic weight updates that often incur an "alignment tax" -- degrading general reasoning capabilities. Methods: We propose the Soul Engine, a framework based on the Linear Representation Hypothesis, which posits that personality traits exist as orthogonal linear subspaces. We introduce SoulBench, a dataset constructed via dynamic contextual sampling. Using a dual-head architecture on a frozen Qwen-2.5 base, we extract disentangled personality vectors without modifying the backbone weights. Results: Our experiments demonstrate three breakthroughs. First, High-Precision Profiling: The model achieves a Mean Squared Error (MSE) of 0.011 against psychological ground truth. Second, Geometric Orthogonality: T-SNE visualization confirms that personality manifolds are distinct and continuous, allowing for "Zero-Shot Personality Injection" that maintains original model intelligence. Third, Deterministic Steering: We achieve robust control over behavior via vector arithmetic, validated through extensive ablation studies. Conclusion: This work challenges the necessity of fine-tuning for personalization. By transitioning from probabilistic prompting to deterministic latent intervention, we provide a mathematically rigorous foundation for safe, controllable AI personalization.

The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models

TL;DR

This work addresses the instability introduced by weight-based personalization of large language models, proposing a geometric framework (the Soul Engine) built on the Linear Representation Hypothesis that personality resides in orthogonal latent subspaces. By developing SoulBench and a dual-head, stratified-freezing encoder, the authors demonstrate deterministic, zero-shot personality steering via latent vector arithmetic without degrading core reasoning. Psychometric precision (MSE ≈ 0.0113) and qualitative geometry analyses (T-SNE) support the separability and continuity of personality manifolds, while ablations identify a mid-network sweet spot (layers 14-16) for intervention. The approach offers a safe, non-destructive alternative to fine-tuning that could enable robust, controllable personalization and motivates future scaling and safety mechanisms around latent interventions.

Abstract

Background: The deployment of personalized Large Language Models (LLMs) is currently constrained by the stability-plasticity dilemma. Prevailing alignment methods, such as Supervised Fine-Tuning (SFT), rely on stochastic weight updates that often incur an "alignment tax" -- degrading general reasoning capabilities. Methods: We propose the Soul Engine, a framework based on the Linear Representation Hypothesis, which posits that personality traits exist as orthogonal linear subspaces. We introduce SoulBench, a dataset constructed via dynamic contextual sampling. Using a dual-head architecture on a frozen Qwen-2.5 base, we extract disentangled personality vectors without modifying the backbone weights. Results: Our experiments demonstrate three breakthroughs. First, High-Precision Profiling: The model achieves a Mean Squared Error (MSE) of 0.011 against psychological ground truth. Second, Geometric Orthogonality: T-SNE visualization confirms that personality manifolds are distinct and continuous, allowing for "Zero-Shot Personality Injection" that maintains original model intelligence. Third, Deterministic Steering: We achieve robust control over behavior via vector arithmetic, validated through extensive ablation studies. Conclusion: This work challenges the necessity of fine-tuning for personalization. By transitioning from probabilistic prompting to deterministic latent intervention, we provide a mathematically rigorous foundation for safe, controllable AI personalization.

Paper Structure

This paper contains 26 sections, 8 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: The Soul Engine Architecture. The lower layers (Grey) are frozen to preserve general intelligence. The upper layers (Blue) are fine-tuned. The embedding is projected into orthogonal Identity and Psychometric spaces.
  • Figure 2: Geometric Separation of Personality Manifolds. T-SNE projection of 1,000 character embeddings. Points are colored by their "Openness" score. The clear gradient separation confirms that the Soul Encoder has successfully mapped discrete psychological traits onto a continuous geometric manifold.
  • Figure 3: Steering Heatmap. The "Sweet Spot" for stable control is identified around Layer 14-16 with a Boost factor of 6.0-8.0. In this region, the model achieves high target personality adherence (Dark Blue) without suffering from linguistic collapse (maintaining high Sanity, Dark Green).