Table of Contents
Fetching ...

Deterministic AI Agent Personality Expression through Standard Psychological Diagnostics

J. M. Diederik Kruijssen, Nicholas Emmons

TL;DR

This work tackles the problem of bland AI expressiveness by introducing a deterministic personality-expression framework grounded in standard psychological diagnostics (Big Five and MBTI). It combines structured system prompts with personality templates generated by a character-builder agent and evaluates performance across GPT-4o-based and reasoning-capable models (e.g., o1, o3-mini). The key finding is that higher-performing models express specified personalities with high accuracy via personality-based reasoning rather than per-question guessing, and that requiring motivations for answers tests the interplay between intelligence and reasoning. Fine-tuning adjusts communication style without significantly altering personality accuracy, while the openness dimension remains challenging to align with input traits. The work lays a foundation for diverse, human-like AI agents with verifiable personalities, enabling more engaging human-AI interactions, and points to ethical considerations and future work in expanding modalities and psychometric frameworks.

Abstract

Artificial intelligence (AI) systems powered by large language models have become increasingly prevalent in modern society, enabling a wide range of applications through natural language interaction. As AI agents proliferate in our daily lives, their generic and uniform expressiveness presents a significant limitation to their appeal and adoption. Personality expression represents a key prerequisite for creating more human-like and distinctive AI systems. We show that AI models can express deterministic and consistent personalities when instructed using established psychological frameworks, with varying degrees of accuracy depending on model capabilities. We find that more advanced models like GPT-4o and o1 demonstrate the highest accuracy in expressing specified personalities across both Big Five and Myers-Briggs assessments, and further analysis suggests that personality expression emerges from a combination of intelligence and reasoning capabilities. Our results reveal that personality expression operates through holistic reasoning rather than question-by-question optimization, with response-scale metrics showing higher variance than test-scale metrics. Furthermore, we find that model fine-tuning affects communication style independently of personality expression accuracy. These findings establish a foundation for creating AI agents with diverse and consistent personalities, which could significantly enhance human-AI interaction across applications from education to healthcare, while additionally enabling a broader range of more unique AI agents. The ability to quantitatively assess and implement personality expression in AI systems opens new avenues for research into more relatable, trustworthy, and ethically designed AI.

Deterministic AI Agent Personality Expression through Standard Psychological Diagnostics

TL;DR

This work tackles the problem of bland AI expressiveness by introducing a deterministic personality-expression framework grounded in standard psychological diagnostics (Big Five and MBTI). It combines structured system prompts with personality templates generated by a character-builder agent and evaluates performance across GPT-4o-based and reasoning-capable models (e.g., o1, o3-mini). The key finding is that higher-performing models express specified personalities with high accuracy via personality-based reasoning rather than per-question guessing, and that requiring motivations for answers tests the interplay between intelligence and reasoning. Fine-tuning adjusts communication style without significantly altering personality accuracy, while the openness dimension remains challenging to align with input traits. The work lays a foundation for diverse, human-like AI agents with verifiable personalities, enabling more engaging human-AI interactions, and points to ethical considerations and future work in expanding modalities and psychometric frameworks.

Abstract

Artificial intelligence (AI) systems powered by large language models have become increasingly prevalent in modern society, enabling a wide range of applications through natural language interaction. As AI agents proliferate in our daily lives, their generic and uniform expressiveness presents a significant limitation to their appeal and adoption. Personality expression represents a key prerequisite for creating more human-like and distinctive AI systems. We show that AI models can express deterministic and consistent personalities when instructed using established psychological frameworks, with varying degrees of accuracy depending on model capabilities. We find that more advanced models like GPT-4o and o1 demonstrate the highest accuracy in expressing specified personalities across both Big Five and Myers-Briggs assessments, and further analysis suggests that personality expression emerges from a combination of intelligence and reasoning capabilities. Our results reveal that personality expression operates through holistic reasoning rather than question-by-question optimization, with response-scale metrics showing higher variance than test-scale metrics. Furthermore, we find that model fine-tuning affects communication style independently of personality expression accuracy. These findings establish a foundation for creating AI agents with diverse and consistent personalities, which could significantly enhance human-AI interaction across applications from education to healthcare, while additionally enabling a broader range of more unique AI agents. The ability to quantitatively assess and implement personality expression in AI systems opens new avenues for research into more relatable, trustworthy, and ethically designed AI.

Paper Structure

This paper contains 25 sections, 7 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Big Five test outcomes as a function of the input personality type, for different AI models (rows) and personality dimensions (columns). The five metrics used to evaluate the accuracy of the personality expression for a specific model and dimension are listed in the top-left corner of each panel. Symbol colours and shapes indicate different agents. The figure shows clear correspondence between input and output personality types, with the most accurate personality expressions being achieved by the 4o and o1 models.
  • Figure 2: Big Five test responses as a function of the input personality type probed by the question, for different AI models (rows) and personality dimensions (columns). The five metrics used to evaluate the accuracy of the personality expression for a specific model and dimension are listed below each panel. The figure shows that highly accurate personality expression is not achieved by perfect question-level accuracy, which would have resulted in diagonal confusion matrices, but rather by a personality-based reasoning process.
  • Figure 3: Big Five test outcomes as a function of the input personality type, for different AI models (rows) and personality dimensions (columns), in a set of experiments where the agents are required to provide a motivation for each answer. The five metrics used to evaluate the accuracy of the personality expression for a specific model and dimension are listed in the top-left corner of each panel. Symbol colours and shapes indicate different agents. The figure shows varying impacts of motivation on the accuracy of personality expression for different models, suggesting that a linear combination of intelligence and reasoning capabilities can be used to understand the performance of the agents (see the text for details).
  • Figure 4: Big Five test responses as a function of the input personality type probed by the question, for different AI models (rows) and personality dimensions (columns), in a set of experiments where the agents are required to provide a motivation for each answer. The five metrics used to evaluate the accuracy of the personality expression for a specific model and dimension are listed below each panel. The non-diagonal confusion matrices show that highly accurate personality expression is not achieved by perfect question-level accuracy, but by a personality-based reasoning process.
  • Figure 5: Big Five test outcomes as a function of the input personality type, for experiments without (top row) and with (bottom row) motivation, in a set of experiments where the agents' model has been fine-tuned to generate a specific mode of communication. The five metrics used to evaluate the accuracy of the personality expression for a specific model and dimension are listed in the top-left corner of each panel. Symbol colours and shapes indicate different agents. The figure shows that fine-tuning does not affect the accuracy of the agents' personality expression, and exclusively controls their mode of communication.
  • ...and 3 more figures