Table of Contents
Fetching ...

Designing AI-Agents with Personalities: A Psychometric Approach

Muhua Huang, Xijuan Zhang, Christopher Soto, James Evans

TL;DR

This work presents a psychometric pipeline to instantiate AI-Agents with Big Five personality traits using validated measures (BFI-2, Mini-Markers). Through three interconnected studies, the authors demonstrate that embedding-based representations of personality constructs are semantically coherent across instruments (Study 1), that AI-Agents built with BFI-2 formats align more closely with human responses on the Mini-Markers than adjective-based prompts (Study 2), and that expanded natural-language prompts yield the strongest human-like patterns when predicting risk-taking and moral decision-making in vignettes (Study 3). Across analyses, newer LLMs improve alignment with human data in continuous formats, yet finer-pattern fidelity remains imperfect, and safety-tuning in advanced models biases moral judgments. The results support the use of AI-Agents as scalable tools for pilot and preliminary research, while emphasizing caution for high-stakes conclusions and the need for further refinement of trait-expression and contextual behavior. Overall, the study offers a principled, scalable methodology for simulating personality in AI that holds promise for psychology and social science research, with clear guidance on formats and models that maximize human-like validity.

Abstract

We introduce a methodology for assigning quantifiable and psychometrically validated personalities to AI-Agents using the Big Five framework. Across three studies, we evaluate its feasibility and limitations. In Study 1, we show that large language models (LLMs) capture semantic similarities among Big Five measures, providing a basis for personality assignment. In Study 2, we create AI-Agents using prompts designed based on the Big Five Inventory-2 (BFI-2) in different format, and find that AI-Agents powered by new models align more closely with human responses on the Mini-Markers test, although the finer pattern of results (e.g., factor loading patterns) were sometimes inconsistent. In Study 3, we validate our AI-Agents on risk-taking and moral dilemma vignettes, finding that models prompted with the BFI-2-Expanded format most closely reproduce human personality-decision associations, while safety-aligned models generally inflate 'moral' ratings. Overall, our results show that AI-Agents align with humans in correlations between input Big Five traits and output responses and may serve as useful tools for preliminary research. Nevertheless, discrepancies in finer response patterns indicate that AI-Agents cannot (yet) fully substitute for human participants in precision or high-stakes projects.

Designing AI-Agents with Personalities: A Psychometric Approach

TL;DR

This work presents a psychometric pipeline to instantiate AI-Agents with Big Five personality traits using validated measures (BFI-2, Mini-Markers). Through three interconnected studies, the authors demonstrate that embedding-based representations of personality constructs are semantically coherent across instruments (Study 1), that AI-Agents built with BFI-2 formats align more closely with human responses on the Mini-Markers than adjective-based prompts (Study 2), and that expanded natural-language prompts yield the strongest human-like patterns when predicting risk-taking and moral decision-making in vignettes (Study 3). Across analyses, newer LLMs improve alignment with human data in continuous formats, yet finer-pattern fidelity remains imperfect, and safety-tuning in advanced models biases moral judgments. The results support the use of AI-Agents as scalable tools for pilot and preliminary research, while emphasizing caution for high-stakes conclusions and the need for further refinement of trait-expression and contextual behavior. Overall, the study offers a principled, scalable methodology for simulating personality in AI that holds promise for psychology and social science research, with clear guidance on formats and models that maximize human-like validity.

Abstract

We introduce a methodology for assigning quantifiable and psychometrically validated personalities to AI-Agents using the Big Five framework. Across three studies, we evaluate its feasibility and limitations. In Study 1, we show that large language models (LLMs) capture semantic similarities among Big Five measures, providing a basis for personality assignment. In Study 2, we create AI-Agents using prompts designed based on the Big Five Inventory-2 (BFI-2) in different format, and find that AI-Agents powered by new models align more closely with human responses on the Mini-Markers test, although the finer pattern of results (e.g., factor loading patterns) were sometimes inconsistent. In Study 3, we validate our AI-Agents on risk-taking and moral dilemma vignettes, finding that models prompted with the BFI-2-Expanded format most closely reproduce human personality-decision associations, while safety-aligned models generally inflate 'moral' ratings. Overall, our results show that AI-Agents align with humans in correlations between input Big Five traits and output responses and may serve as useful tools for preliminary research. Nevertheless, discrepancies in finer response patterns indicate that AI-Agents cannot (yet) fully substitute for human participants in precision or high-stakes projects.

Paper Structure

This paper contains 45 sections, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Cosine Similarity Between Personality Tests: Overall Average and Domain-Specific Comparisons
  • Figure 2: Two-Dimensional Projection of Big Five Personality Test Domain Embeddings Using t-SNE
  • Figure 3: Mini-Markers' Conscientiousness Scores For Human and AI-Agents
  • Figure 4: Selected Scores on Vignettes For Human and AI-Agents