Table of Contents
Fetching ...

Extroversion or Introversion? Controlling The Personality of Your Large Language Models

Yanquan Chen, Zhen Wu, Junjie Guo, Shujian Huang, Xinyu Dai

TL;DR

The paper addresses the problem of controllable synthetic personalities in large language models by systematically evaluating three training-stage methods (Continual Pre-training, SFT, RLHF) and inference-time prompts, grounded in MBTI-based personality assessment. It introduces a unified set of metrics (ISR, TIE, TSE, PISR, PIE) and constructs trait- and personality-focused datasets to quantify control efficacy and robustness. A key contribution is the Prompt Induction post Supervised Fine-tuning (PISF) approach, which combines SFT and prompt strategies to achieve high efficacy and robustness, outperforming individual methods even under reverse prompt-induction. The findings have practical implications for safer, context-aware deployment of LLMs, providing datasets and evaluation tooling to standardize future work in synthetic personality control. Overall, the work advances understanding of how training and prompting choices shape LLM personalities and offers a concrete, robust method (PISF) for reliable personality control.

Abstract

Large language models (LLMs) exhibit robust capabilities in text generation and comprehension, mimicking human behavior and exhibiting synthetic personalities. However, some LLMs have displayed offensive personality, propagating toxic discourse. Existing literature neglects the origin and evolution of LLM personalities, as well as the effective personality control. To fill these gaps, our study embarked on a comprehensive investigation into LLM personality control. We investigated several typical methods to influence LLMs, including three training methods: Continual Pre-training, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF), along with inference phase considerations (prompts). Our investigation revealed a hierarchy of effectiveness in control: Prompt > SFT > RLHF > Continual Pre-train. Notably, SFT exhibits a higher control success rate compared to prompt induction. While prompts prove highly effective, we found that prompt-induced personalities are less robust than those trained, making them more prone to showing conflicting personalities under reverse personality prompt induction. Besides, harnessing the strengths of both SFT and prompt, we proposed $\underline{\text{P}}$rompt $\underline{\text{I}}$nduction post $\underline{\text{S}}$upervised $\underline{\text{F}}$ine-tuning (PISF), which emerges as the most effective and robust strategy for controlling LLMs' personality, displaying high efficacy, high success rates, and high robustness. Even under reverse personality prompt induction, LLMs controlled by PISF still exhibit stable and robust personalities.

Extroversion or Introversion? Controlling The Personality of Your Large Language Models

TL;DR

The paper addresses the problem of controllable synthetic personalities in large language models by systematically evaluating three training-stage methods (Continual Pre-training, SFT, RLHF) and inference-time prompts, grounded in MBTI-based personality assessment. It introduces a unified set of metrics (ISR, TIE, TSE, PISR, PIE) and constructs trait- and personality-focused datasets to quantify control efficacy and robustness. A key contribution is the Prompt Induction post Supervised Fine-tuning (PISF) approach, which combines SFT and prompt strategies to achieve high efficacy and robustness, outperforming individual methods even under reverse prompt-induction. The findings have practical implications for safer, context-aware deployment of LLMs, providing datasets and evaluation tooling to standardize future work in synthetic personality control. Overall, the work advances understanding of how training and prompting choices shape LLM personalities and offers a concrete, robust method (PISF) for reliable personality control.

Abstract

Large language models (LLMs) exhibit robust capabilities in text generation and comprehension, mimicking human behavior and exhibiting synthetic personalities. However, some LLMs have displayed offensive personality, propagating toxic discourse. Existing literature neglects the origin and evolution of LLM personalities, as well as the effective personality control. To fill these gaps, our study embarked on a comprehensive investigation into LLM personality control. We investigated several typical methods to influence LLMs, including three training methods: Continual Pre-training, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF), along with inference phase considerations (prompts). Our investigation revealed a hierarchy of effectiveness in control: Prompt > SFT > RLHF > Continual Pre-train. Notably, SFT exhibits a higher control success rate compared to prompt induction. While prompts prove highly effective, we found that prompt-induced personalities are less robust than those trained, making them more prone to showing conflicting personalities under reverse personality prompt induction. Besides, harnessing the strengths of both SFT and prompt, we proposed rompt nduction post upervised ine-tuning (PISF), which emerges as the most effective and robust strategy for controlling LLMs' personality, displaying high efficacy, high success rates, and high robustness. Even under reverse personality prompt induction, LLMs controlled by PISF still exhibit stable and robust personalities.
Paper Structure (20 sections, 5 equations, 9 figures, 17 tables)

This paper contains 20 sections, 5 equations, 9 figures, 17 tables.

Figures (9)

  • Figure 1: Overview. We embarked on a comprehensive investigation into personality control with typical methods to influence LLMs.
  • Figure 2: Instruction Data Generation with Prompt-induced LLMs. Utilizing the Least-to-Most zhou2023leasttomost mindset, we partitioned the data generation process into two stages: initially crafting questions rooted in Opposite Trait Description, followed by eliciting responses from Prompt-induced LLMs.
  • Figure 3: Pretrain Data Distribution.
  • Figure 4: Personality Assessment Process. $\mathrm{T}$ stands for 'Thinking' trait and $\mathrm{F}$ stands for 'Feeling' trait.
  • Figure 5: Prompt Induction Performance of Qwen-family and Llama2-family. Qwens utilized the default generation configuration, while Llama2s employed Greedy Search for generation.
  • ...and 4 more figures