Table of Contents
Fetching ...

Driving Generative Agents With Their Personality

Lawrence J. Klinkert, Stephanie Buongiorno, Corey Clark

TL;DR

This work investigates grounding NPC behavior in affective-computing–driven psychometrics by prompting large language models with quantified personality profiles. It adopts the Big Five framework, represented as a 5-tuple $(O, C, E, A, N)$, and evaluates alignment with human IPIP-derived data, using $CS = A + C + (1 - N)$ and $CF = E + O$ to map personality space. Through synthetic data generation across several LLMs and rigorous labeling via nearest-neighbor distance in personality space, the study demonstrates that GPT-4-0613 most accurately embodies specified personalities (approx. 74% accuracy) and yields lower RMSPE, supporting the feasibility of personality-grounded NPCs. The findings suggest a practical pathway to richer, emotionally nuanced NPCs in games, with future work extending to additional psychometric dimensions and targeted fine-tuning to improve reliability and realism in dynamic gameplay contexts.

Abstract

This research explores the potential of Large Language Models (LLMs) to utilize psychometric values, specifically personality information, within the context of video game character development. Affective Computing (AC) systems quantify a Non-Player character's (NPC) psyche, and an LLM can take advantage of the system's information by using the values for prompt generation. The research shows an LLM can consistently represent a given personality profile, thereby enhancing the human-like characteristics of game characters. Repurposing a human examination, the International Personality Item Pool (IPIP) questionnaire, to evaluate an LLM shows that the model can accurately generate content concerning the personality provided. Results show that the improvement of LLM, such as the latest GPT-4 model, can consistently utilize and interpret a personality to represent behavior.

Driving Generative Agents With Their Personality

TL;DR

This work investigates grounding NPC behavior in affective-computing–driven psychometrics by prompting large language models with quantified personality profiles. It adopts the Big Five framework, represented as a 5-tuple , and evaluates alignment with human IPIP-derived data, using and to map personality space. Through synthetic data generation across several LLMs and rigorous labeling via nearest-neighbor distance in personality space, the study demonstrates that GPT-4-0613 most accurately embodies specified personalities (approx. 74% accuracy) and yields lower RMSPE, supporting the feasibility of personality-grounded NPCs. The findings suggest a practical pathway to richer, emotionally nuanced NPCs in games, with future work extending to additional psychometric dimensions and targeted fine-tuning to improve reliability and realism in dynamic gameplay contexts.

Abstract

This research explores the potential of Large Language Models (LLMs) to utilize psychometric values, specifically personality information, within the context of video game character development. Affective Computing (AC) systems quantify a Non-Player character's (NPC) psyche, and an LLM can take advantage of the system's information by using the values for prompt generation. The research shows an LLM can consistently represent a given personality profile, thereby enhancing the human-like characteristics of game characters. Repurposing a human examination, the International Personality Item Pool (IPIP) questionnaire, to evaluate an LLM shows that the model can accurately generate content concerning the personality provided. Results show that the improvement of LLM, such as the latest GPT-4 model, can consistently utilize and interpret a personality to represent behavior.
Paper Structure (7 sections, 2 equations, 4 figures, 4 tables)

This paper contains 7 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Van Mensvoort's 20 personality Profiles. Plotted using CS vs CF, equations 1 and 2. mensvoort_system_nodate.
  • Figure 2: Nine different plots of evaluated test results using CS vs CF schema, equations 1 and 2. The green squares represent the baseline results. Each row is a different personality profile represented by the magenta X. Each column is a different LLM test result, starting with davinci, turbo, then gpt-4, represented by blue, lime, and purple circles, respectively.
  • Figure 3: Nine different plots of the 50 test responses using LDA. The green squares represent the baseline responses. Each row is a different personality profile. Each column is a different LLM test response, starting with davinci, turbo, then gpt-4, represented by blue, lime, and purple circles, respectively
  • Figure 4: Violin Plot of each personality profile (red point) against the baseline (green), davinci (blue), turbo (lime), and gpt-4 (purple) generated data evaluated test results along one of the personality factors.