Can LLMs Assess Personality? Validating Conversational AI for Trait Profiling
Andrius Matšenas, Anet Lello, Tõnis Lees, Hans Peep, Kim Lilii Tamm
TL;DR
The paper tackles the limitations of static self-report personality assessment by validating real-time, guided LLM conversations as a dynamic alternative for Big Five profiling. Using a within-subject design (N=33), it compares LLM-derived trait scores against the IPIP-50 gold standard and measures user-perceived accuracy. Results show moderate convergent validity ($r \in [0.38,0.58]$) with three traits (Conscientiousness, Openness, Neuroticism) statistically equivalent across methods, while Agreeableness and Extraversion show trait-specific differences; participants rate both methods as equally accurate. The work contributes a validation framework for conversational psychometrics and highlights practical potential for consumer applications, albeit with calibration needs for certain traits and limitations in generalizability.
Abstract
This study validates Large Language Models (LLMs) as a dynamic alternative to questionnaire-based personality assessment. Using a within-subjects experiment (N=33), we compared Big Five personality scores derived from guided LLM conversations against the gold-standard IPIP-50 questionnaire, while also measuring user-perceived accuracy. Results indicate moderate convergent validity (r=0.38-0.58), with Conscientiousness, Openness, and Neuroticism scores statistically equivalent between methods. Agreeableness and Extraversion showed significant differences, suggesting trait-specific calibration is needed. Notably, participants rated LLM-generated profiles as equally accurate as traditional questionnaire results. These findings suggest conversational AI offers a promising new approach to traditional psychometrics.
