Who are you, ChatGPT? Personality and Demographic Style in LLM-Generated Content
Dana Sotto Porat, Ella Rabinovich
TL;DR
This work investigates whether modern LLMs exhibit personality- and demographic-like traits in generated text by adopting a data-driven, non-self-report approach. Using automatic Big Five trait classifiers and a DistilBERT-based gender detector on thousands of Reddit-derived prompts and responses, the authors compare six models (open and closed) to human-authored text. The findings show LLMs consistently display higher Agreeableness and lower Neuroticism than humans, with Openness and Extroversion largely similar, and gendered language patterns largely aligning with humans but with reduced variation. The authors release a curated Reddit-derived dataset and code, enabling large-scale, controlled analyses of AI-generated personality and demographic signals, which has implications for understanding sociolinguistic cues in AI and informing responsible deployment.
Abstract
Generative large language models (LLMs) have become central to everyday life, producing human-like text across diverse domains. A growing body of research investigates whether these models also exhibit personality- and demographic-like characteristics in their language. In this work, we introduce a novel, data-driven methodology for assessing LLM personality without relying on self-report questionnaires, applying instead automatic personality and gender classifiers to model replies on open-ended questions collected from Reddit. Comparing six widely used models to human-authored responses, we find that LLMs systematically express higher Agreeableness and lower Neuroticism, reflecting cooperative and stable conversational tendencies. Gendered language patterns in model text broadly resemble those of human writers, though with reduced variation, echoing prior findings on automated agents. We contribute a new dataset of human and model responses, along with large-scale comparative analyses, shedding new light on the topic of personality and demographic patterns of generative AI.
