Table of Contents
Fetching ...

Secret Keepers: The Impact of LLMs on Linguistic Markers of Personal Traits

Zhivar Sourati, Meltem Ozcan, Colin McDaniel, Alireza Ziabari, Nuan Wen, Ala Tak, Fred Morstatter, Morteza Dehghani

TL;DR

This study investigates whether linguistic markers of personal traits remain informative when authors use LLMs as writing aids. Using three modern LLMs (GPT3.5, Llama 2, Gemini) and two prompts, it compares six demographic/psychological attributes across original and LLM-generated texts, employing both data-driven classifiers and theory-driven lexical cues (LIWC, NRC, MFD2). The results show that LLM involvement slightly reduces predictive power, with significant declines being infrequent, though some markers lose reliability and semantic content is variably preserved depending on the LLM and prompt. These findings have important implications for linguistics-based trait inference in the era of widespread LLM-assisted writing, highlighting that methodological choices and model selection can influence the reliability of trait predictions from language. They also raise considerations for privacy and the interpretation of lexical markers when writing assistance tools are pervasive in everyday communication.

Abstract

Prior research has established associations between individuals' language usage and their personal traits; our linguistic patterns reveal information about our personalities, emotional states, and beliefs. However, with the increasing adoption of Large Language Models (LLMs) as writing assistants in everyday writing, a critical question emerges: are authors' linguistic patterns still predictive of their personal traits when LLMs are involved in the writing process? We investigate the impact of LLMs on the linguistic markers of demographic and psychological traits, specifically examining three LLMs - GPT3.5, Llama 2, and Gemini - across six different traits: gender, age, political affiliation, personality, empathy, and morality. Our findings indicate that although the use of LLMs slightly reduces the predictive power of linguistic patterns over authors' personal traits, the significant changes are infrequent, and the use of LLMs does not fully diminish the predictive power of authors' linguistic patterns over their personal traits. We also note that some theoretically established lexical-based linguistic markers lose their reliability as predictors when LLMs are used in the writing process. Our findings have important implications for the study of linguistic markers of personal traits in the age of LLMs.

Secret Keepers: The Impact of LLMs on Linguistic Markers of Personal Traits

TL;DR

This study investigates whether linguistic markers of personal traits remain informative when authors use LLMs as writing aids. Using three modern LLMs (GPT3.5, Llama 2, Gemini) and two prompts, it compares six demographic/psychological attributes across original and LLM-generated texts, employing both data-driven classifiers and theory-driven lexical cues (LIWC, NRC, MFD2). The results show that LLM involvement slightly reduces predictive power, with significant declines being infrequent, though some markers lose reliability and semantic content is variably preserved depending on the LLM and prompt. These findings have important implications for linguistics-based trait inference in the era of widespread LLM-assisted writing, highlighting that methodological choices and model selection can influence the reliability of trait predictions from language. They also raise considerations for privacy and the interpretation of lexical markers when writing assistance tools are pervasive in everyday communication.

Abstract

Prior research has established associations between individuals' language usage and their personal traits; our linguistic patterns reveal information about our personalities, emotional states, and beliefs. However, with the increasing adoption of Large Language Models (LLMs) as writing assistants in everyday writing, a critical question emerges: are authors' linguistic patterns still predictive of their personal traits when LLMs are involved in the writing process? We investigate the impact of LLMs on the linguistic markers of demographic and psychological traits, specifically examining three LLMs - GPT3.5, Llama 2, and Gemini - across six different traits: gender, age, political affiliation, personality, empathy, and morality. Our findings indicate that although the use of LLMs slightly reduces the predictive power of linguistic patterns over authors' personal traits, the significant changes are infrequent, and the use of LLMs does not fully diminish the predictive power of authors' linguistic patterns over their personal traits. We also note that some theoretically established lexical-based linguistic markers lose their reliability as predictors when LLMs are used in the writing process. Our findings have important implications for the study of linguistic markers of personal traits in the age of LLMs.
Paper Structure (32 sections, 7 figures, 12 tables)

This paper contains 32 sections, 7 figures, 12 tables.

Figures (7)

  • Figure 1: Semantic similarity between original and LLM-generated texts (with Rephrase and Syntax_Grammar prompts) across different data sources and utilized LLMs.
  • Figure 2: The ratio of classifiers with unchanged predictive power after LLM rewrites across different author attributes. The left plot shows the aggregated view and the right plot shows the variability across two prompts: Rephrase and Syntax_Grammar.
  • Figure 3: The ratio of models with unchanged predictive power after LLM rewrite, across different author attributes and different LLMs.
  • Figure 4: Ratios of correct author attribute predictions on original texts that had different predicted labels on LLM-generated texts, grouped by the direction of change in predictions.
  • Figure 5: The ratio of unchanged predictive powers for two versions of the Empathetic Conversations dataset, one with aggregated essays per author and one containing all essays from the same author as individual observations.
  • ...and 2 more figures