Secret Keepers: The Impact of LLMs on Linguistic Markers of Personal Traits
Zhivar Sourati, Meltem Ozcan, Colin McDaniel, Alireza Ziabari, Nuan Wen, Ala Tak, Fred Morstatter, Morteza Dehghani
TL;DR
This study investigates whether linguistic markers of personal traits remain informative when authors use LLMs as writing aids. Using three modern LLMs (GPT3.5, Llama 2, Gemini) and two prompts, it compares six demographic/psychological attributes across original and LLM-generated texts, employing both data-driven classifiers and theory-driven lexical cues (LIWC, NRC, MFD2). The results show that LLM involvement slightly reduces predictive power, with significant declines being infrequent, though some markers lose reliability and semantic content is variably preserved depending on the LLM and prompt. These findings have important implications for linguistics-based trait inference in the era of widespread LLM-assisted writing, highlighting that methodological choices and model selection can influence the reliability of trait predictions from language. They also raise considerations for privacy and the interpretation of lexical markers when writing assistance tools are pervasive in everyday communication.
Abstract
Prior research has established associations between individuals' language usage and their personal traits; our linguistic patterns reveal information about our personalities, emotional states, and beliefs. However, with the increasing adoption of Large Language Models (LLMs) as writing assistants in everyday writing, a critical question emerges: are authors' linguistic patterns still predictive of their personal traits when LLMs are involved in the writing process? We investigate the impact of LLMs on the linguistic markers of demographic and psychological traits, specifically examining three LLMs - GPT3.5, Llama 2, and Gemini - across six different traits: gender, age, political affiliation, personality, empathy, and morality. Our findings indicate that although the use of LLMs slightly reduces the predictive power of linguistic patterns over authors' personal traits, the significant changes are infrequent, and the use of LLMs does not fully diminish the predictive power of authors' linguistic patterns over their personal traits. We also note that some theoretically established lexical-based linguistic markers lose their reliability as predictors when LLMs are used in the writing process. Our findings have important implications for the study of linguistic markers of personal traits in the age of LLMs.
