Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models

Pranav Bhandari; Usman Naseem; Mehwish Nasim

Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models

Pranav Bhandari, Usman Naseem, Mehwish Nasim

TL;DR

This work tests whether Big Five personality steering directions in large language models are geometrically independent by applying a suite of constraint schemes (C0–C5) to two model families (LLaMA-3-8B-Instruct and Mistral-8B-Instruct). It finds that steering one trait consistently induces changes in others, and that enforcing orthogonality (including hard Löwdin-based orthonormalisation) does not reliably eliminate cross-trait bleed and often reduces fluency. Cross-model analyses reveal similar dependency patterns across models, though model-specific differences in responsiveness point to training and alignment factors beyond geometry. The results imply that personality traits in LLMs occupy a slightly coupled subspace, limiting fully independent trait control and highlighting practical trade-offs for activation-engineering approaches.

Abstract

Personality steering in large language models (LLMs) commonly relies on injecting trait-specific steering vectors, implicitly assuming that personality traits can be controlled independently. In this work, we examine whether this assumption holds by analysing the geometric relationships between Big Five personality steering directions. We study steering vectors extracted from two model families (LLaMA-3-8B and Mistral-8B) and apply a range of geometric conditioning schemes, from unconstrained directions to soft and hard orthonormalisation. Our results show that personality steering directions exhibit substantial geometric dependence: steering one trait consistently induces changes in others, even when linear overlap is explicitly removed. While hard orthonormalisation enforces geometric independence, it does not eliminate cross-trait behavioural effects and can reduce steering strength. These findings suggest that personality traits in LLMs occupy a slightly coupled subspace, limiting fully independent trait control.

Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models

TL;DR

Abstract

Paper Structure (11 sections, 1 figure, 12 tables)

This paper contains 11 sections, 1 figure, 12 tables.

Introduction and Background
Methodology
Steering Mechanism.
Trait Direction Conditioning (C0--C5).
Evaluation
Results
Geometric Independence of Personality Steering Directions
RQ3: Cross-Model Consistency of Trait Dependencies
Conclusion
Ethical Considerations
All detailed tables for C0-C5 for trait values

Figures (1)

Figure 1: Fluency profile analysis for conditions C0-C5 compared against Base steering. Fluency degradation for both Positive and Negative steering can be observed across all traits for all the Conditional Methods used. Although trait shifts that were comparable to the base values, the significant degradation of fluency suggests the need to use orthogonalised vectors carefully for steering purpose.

Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models

TL;DR

Abstract

Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (1)