Large Language Models Can Infer Psychological Dispositions of Social Media Users
Heinrich Peters, Sandra Matz
TL;DR
The study investigates whether zero-shot LLMs (GPT-3.5 and GPT-4) can infer the Big Five personality traits from Facebook status updates and how accuracy varies by age and gender. Using 1000 MyPersonality users with 200 recent status updates, self-reported IPIP scores are contrasted with LLM-derived trait scores, showing overall correlations of $r_{GPT3.5}=0.27$ and $r_{GPT4}=0.31$, with Openness, Extraversion, and Agreeableness being most detectable. Demographic analyses reveal gender and age biases, with women generally yielding more accurate inferences and older users showing mixed or weaker signals, though within-group correlations remain comparable. Agreement with third-party observer ratings indicates LLM inferences are broadly similar in quality to human judgments, underscoring both the potential and the ethical challenges of automated psychometrics. The authors call for governance, privacy safeguards, and further work to unpack the cues and mechanisms behind these zero-shot inferences, as well as to improve accuracy on less-inferable traits.
Abstract
Large Language Models (LLMs) demonstrate increasingly human-like abilities across a wide variety of tasks. In this paper, we investigate whether LLMs like ChatGPT can accurately infer the psychological dispositions of social media users and whether their ability to do so varies across socio-demographic groups. Specifically, we test whether GPT-3.5 and GPT-4 can derive the Big Five personality traits from users' Facebook status updates in a zero-shot learning scenario. Our results show an average correlation of r = .29 (range = [.22, .33]) between LLM-inferred and self-reported trait scores - a level of accuracy that is similar to that of supervised machine learning models specifically trained to infer personality. Our findings also highlight heterogeneity in the accuracy of personality inferences across different age groups and gender categories: predictions were found to be more accurate for women and younger individuals on several traits, suggesting a potential bias stemming from the underlying training data or differences in online self-expression. The ability of LLMs to infer psychological dispositions from user-generated text has the potential to democratize access to cheap and scalable psychometric assessments for both researchers and practitioners. On the one hand, this democratization might facilitate large-scale research of high ecological validity and spark innovation in personalized services. On the other hand, it also raises ethical concerns regarding user privacy and self-determination, highlighting the need for stringent ethical frameworks and regulation.
