Extroversion or Introversion? Controlling The Personality of Your Large Language Models

Yanquan Chen; Zhen Wu; Junjie Guo; Shujian Huang; Xinyu Dai

Extroversion or Introversion? Controlling The Personality of Your Large Language Models

Yanquan Chen, Zhen Wu, Junjie Guo, Shujian Huang, Xinyu Dai

TL;DR

The paper addresses the problem of controllable synthetic personalities in large language models by systematically evaluating three training-stage methods (Continual Pre-training, SFT, RLHF) and inference-time prompts, grounded in MBTI-based personality assessment. It introduces a unified set of metrics (ISR, TIE, TSE, PISR, PIE) and constructs trait- and personality-focused datasets to quantify control efficacy and robustness. A key contribution is the Prompt Induction post Supervised Fine-tuning (PISF) approach, which combines SFT and prompt strategies to achieve high efficacy and robustness, outperforming individual methods even under reverse prompt-induction. The findings have practical implications for safer, context-aware deployment of LLMs, providing datasets and evaluation tooling to standardize future work in synthetic personality control. Overall, the work advances understanding of how training and prompting choices shape LLM personalities and offers a concrete, robust method (PISF) for reliable personality control.

Abstract

Large language models (LLMs) exhibit robust capabilities in text generation and comprehension, mimicking human behavior and exhibiting synthetic personalities. However, some LLMs have displayed offensive personality, propagating toxic discourse. Existing literature neglects the origin and evolution of LLM personalities, as well as the effective personality control. To fill these gaps, our study embarked on a comprehensive investigation into LLM personality control. We investigated several typical methods to influence LLMs, including three training methods: Continual Pre-training, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF), along with inference phase considerations (prompts). Our investigation revealed a hierarchy of effectiveness in control: Prompt > SFT > RLHF > Continual Pre-train. Notably, SFT exhibits a higher control success rate compared to prompt induction. While prompts prove highly effective, we found that prompt-induced personalities are less robust than those trained, making them more prone to showing conflicting personalities under reverse personality prompt induction. Besides, harnessing the strengths of both SFT and prompt, we proposed $\underline{\text{P}}$rompt $\underline{\text{I}}$nduction post $\underline{\text{S}}$upervised $\underline{\text{F}}$ine-tuning (PISF), which emerges as the most effective and robust strategy for controlling LLMs' personality, displaying high efficacy, high success rates, and high robustness. Even under reverse personality prompt induction, LLMs controlled by PISF still exhibit stable and robust personalities.

Extroversion or Introversion? Controlling The Personality of Your Large Language Models

TL;DR

Abstract

rompt

nduction post

upervised

ine-tuning (PISF), which emerges as the most effective and robust strategy for controlling LLMs' personality, displaying high efficacy, high success rates, and high robustness. Even under reverse personality prompt induction, LLMs controlled by PISF still exhibit stable and robust personalities.

Paper Structure (20 sections, 5 equations, 9 figures, 17 tables)

This paper contains 20 sections, 5 equations, 9 figures, 17 tables.

Introduction
Background: Personality Assessment
Methodology
Personality Dataset Construction for Popular Training Methods
Personality Assessment
Metrics of Personality Control
Preliminary Investigation
Experiments
Setting
Main Results and Analysis
PISF: Prompt Induction post Supervised Fine-tuning
Related Work
Conclusion
Limitations
Ethics Statement
...and 5 more sections

Figures (9)

Figure 1: Overview. We embarked on a comprehensive investigation into personality control with typical methods to influence LLMs.
Figure 2: Instruction Data Generation with Prompt-induced LLMs. Utilizing the Least-to-Most zhou2023leasttomost mindset, we partitioned the data generation process into two stages: initially crafting questions rooted in Opposite Trait Description, followed by eliciting responses from Prompt-induced LLMs.
Figure 3: Pretrain Data Distribution.
Figure 4: Personality Assessment Process. $\mathrm{T}$ stands for 'Thinking' trait and $\mathrm{F}$ stands for 'Feeling' trait.
Figure 5: Prompt Induction Performance of Qwen-family and Llama2-family. Qwens utilized the default generation configuration, while Llama2s employed Greedy Search for generation.
...and 4 more figures

Extroversion or Introversion? Controlling The Personality of Your Large Language Models

TL;DR

Abstract

Extroversion or Introversion? Controlling The Personality of Your Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (9)