PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Hang Jiang; Xiajie Zhang; Xubo Cao; Cynthia Breazeal; Deb Roy; Jad Kabbara

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Hang Jiang, Xiajie Zhang, Xubo Cao, Cynthia Breazeal, Deb Roy, Jad Kabbara

TL;DR

This work investigates whether LLM-based personas can faithfully express assigned Big Five personality traits. It employs GPT-3.5 and GPT-4 to generate 320 personas, administers the Big Five Inventory, and analyzes 800-word stories with LIWC and human/LLM evaluations, alongside personality-prediction tasks. Key findings show self-reported BFI scores align with assigned traits, linguistics mirror trait-associated patterns, and humans/LLMs can discern traits from narratives, though awareness of AI authorship can dampen accuracy. The results advance understanding of personality expressivity in LLMs and inform ethical and design considerations for personalized AI systems.

Abstract

Despite the many use cases for large language models (LLMs) in creating personalized chatbots, there has been limited research on evaluating the extent to which the behaviors of personalized LLMs accurately and consistently reflect specific personality traits. We consider studying the behavior of LLM-based agents which we refer to as LLM personas and present a case study with GPT-3.5 and GPT-4 to investigate whether LLMs can generate content that aligns with their assigned personality profiles. To this end, we simulate distinct LLM personas based on the Big Five personality model, have them complete the 44-item Big Five Inventory (BFI) personality test and a story writing task, and then assess their essays with automatic and human evaluations. Results show that LLM personas' self-reported BFI scores are consistent with their designated personality types, with large effect sizes observed across five traits. Additionally, LLM personas' writings have emerging representative linguistic patterns for personality traits when compared with a human writing corpus. Furthermore, human evaluation shows that humans can perceive some personality traits with an accuracy of up to 80%. Interestingly, the accuracy drops significantly when the annotators were informed of AI authorship.

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

TL;DR

Abstract

Paper Structure (43 sections, 18 figures, 7 tables)

This paper contains 43 sections, 18 figures, 7 tables.

Introduction
Experiment Design
Experiment Setup
Model Settings
LLM Persona Simulation
BFI Personality Test
Storywriting
Evaluation Methods
LIWC Analysis
Story Evaluation
Personality Prediction
Results
RQ1: Behavior in BFI Assessment
RQ2: Linguistic Patterns in Writing
RQ3: Story Evaluation
...and 28 more sections

Figures (18)

Figure 1: Illustration of the core workflow of the paper. The left section presents the prompts designed to create LLM personas. The center section shows the prompt used to instruct models to write stories. The right section outlines the three-pronged analytical approach: LIWC analysis, story evaluation, and text-based personality prediction.
Figure 2: BFI assessment in five personality dimensions by GPT-3.5 and GPT-4 personas. Significant statistical differences are found across all dimensions.
Figure 3: Individual accuracy of human and LLM evaluators in predicting personality.
Figure 4: Collective accuracy of human and LLM evaluators in predicting personality with majority votes.
Figure 7: Consent form on Prolific.
...and 13 more figures

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

TL;DR

Abstract

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Authors

TL;DR

Abstract

Table of Contents

Figures (18)