Large Language Models Can Infer Personality from Free-Form User Interactions
Heinrich Peters, Moran Cerf, Sandra C. Matz
TL;DR
The paper investigates whether large language models can infer Big Five personality traits from free-form conversations and how prompt design and interaction mode affect accuracy and user experience. Using a 3x2 between-subjects design with GPT-4 across assessment, acquaintance, and assistant prompts and two user modes, the study measures correlations with BF I-2 and collects UX data. Findings show strongest inferences when the chatbot is prompted to assess personality, with meaningful signals also present in naturalistic interactions, while always maintaining generally positive user experiences. The work demonstrates scalable, conversational psychological profiling potential but also highlights ethical and privacy challenges that require thoughtful governance as these capabilities scale.
Abstract
This study investigates the capacity of Large Language Models (LLMs) to infer the Big Five personality traits from free-form user interactions. The results demonstrate that a chatbot powered by GPT-4 can infer personality with moderate accuracy, outperforming previous approaches drawing inferences from static text content. The accuracy of inferences varied across different conversational settings. Performance was highest when the chatbot was prompted to elicit personality-relevant information from users (mean r=.443, range=[.245, .640]), followed by a condition placing greater emphasis on naturalistic interaction (mean r=.218, range=[.066, .373]). Notably, the direct focus on personality assessment did not result in a less positive user experience, with participants reporting the interactions to be equally natural, pleasant, engaging, and humanlike across both conditions. A chatbot mimicking ChatGPT's default behavior of acting as a helpful assistant led to markedly inferior personality inferences and lower user experience ratings but still captured psychologically meaningful information for some of the personality traits (mean r=.117, range=[-.004, .209]). Preliminary analyses suggest that the accuracy of personality inferences varies only marginally across different socio-demographic subgroups. Our results highlight the potential of LLMs for psychological profiling based on conversational interactions. We discuss practical implications and ethical challenges associated with these findings.
