Table of Contents
Fetching ...

Do Large Language Models Adapt to Language Variation across Socioeconomic Status?

Elisa Bassignana, Mike Zhang, Dirk Hovy, Amanda Cercas Curry

TL;DR

The study probes whether large language models can adapt their linguistic style to different SES communities in social media. By constructing SES-stratified Reddit and YouTube datasets and prompting four LLMs with three prompting strategies, the authors quantify stylistic alignment across 94 sociolinguistic features. They find that LLMs only weakly adjust to SES, often approximating or caricaturing upper-SES styles, with a notable bias toward upper-SES emulation and limited gains from longer input context. The results raise concerns about AI-driven amplification of linguistic hierarchies, challenge the use of LLMs for agent-based social simulations, and highlight the need for careful consideration of SES representation in language-enabled communication tools. The work contributes a publicly available SES-differentiated dataset and a rigorous, feature-driven analysis across multiple models and prompts to illuminate the SES-adaptation gap in LLM-generated language.

Abstract

Humans adjust their linguistic style to the audience they are addressing. However, the extent to which LLMs adapt to different social contexts is largely unknown. As these models increasingly mediate human-to-human communication, their failure to adapt to diverse styles can perpetuate stereotypes and marginalize communities whose linguistic norms are less closely mirrored by the models, thereby reinforcing social stratification. We study the extent to which LLMs integrate into social media communication across different socioeconomic status (SES) communities. We collect a novel dataset from Reddit and YouTube, stratified by SES. We prompt four LLMs with incomplete text from that corpus and compare the LLM-generated completions to the originals along 94 sociolinguistic metrics, including syntactic, rhetorical, and lexical features. LLMs modulate their style with respect to SES to only a minor extent, often resulting in approximation or caricature, and tend to emulate the style of upper SES more effectively. Our findings (1) show how LLMs risk amplifying linguistic hierarchies and (2) call into question their validity for agent-based social simulation, survey experiments, and any research relying on language style as a social signal.

Do Large Language Models Adapt to Language Variation across Socioeconomic Status?

TL;DR

The study probes whether large language models can adapt their linguistic style to different SES communities in social media. By constructing SES-stratified Reddit and YouTube datasets and prompting four LLMs with three prompting strategies, the authors quantify stylistic alignment across 94 sociolinguistic features. They find that LLMs only weakly adjust to SES, often approximating or caricaturing upper-SES styles, with a notable bias toward upper-SES emulation and limited gains from longer input context. The results raise concerns about AI-driven amplification of linguistic hierarchies, challenge the use of LLMs for agent-based social simulations, and highlight the need for careful consideration of SES representation in language-enabled communication tools. The work contributes a publicly available SES-differentiated dataset and a rigorous, feature-driven analysis across multiple models and prompts to illuminate the SES-adaptation gap in LLM-generated language.

Abstract

Humans adjust their linguistic style to the audience they are addressing. However, the extent to which LLMs adapt to different social contexts is largely unknown. As these models increasingly mediate human-to-human communication, their failure to adapt to diverse styles can perpetuate stereotypes and marginalize communities whose linguistic norms are less closely mirrored by the models, thereby reinforcing social stratification. We study the extent to which LLMs integrate into social media communication across different socioeconomic status (SES) communities. We collect a novel dataset from Reddit and YouTube, stratified by SES. We prompt four LLMs with incomplete text from that corpus and compare the LLM-generated completions to the originals along 94 sociolinguistic metrics, including syntactic, rhetorical, and lexical features. LLMs modulate their style with respect to SES to only a minor extent, often resulting in approximation or caricature, and tend to emulate the style of upper SES more effectively. Our findings (1) show how LLMs risk amplifying linguistic hierarchies and (2) call into question their validity for agent-based social simulation, survey experiments, and any research relying on language style as a social signal.
Paper Structure (50 sections, 10 figures, 3 tables)

This paper contains 50 sections, 10 figures, 3 tables.

Figures (10)

  • Figure 1: We compare the style of LLM-generated completions against the original text from lower and upper SES communities on Reddit and YouTube along 94 sociolinguistic dimensions.
  • Figure 2: Forest Plots Comparing Linguistic Features of Humans and Models on Reddit. We only show the linguistic features (31) with a statistically significant difference with correction ($p<0.01$; mann1947testholm1979simple) in usage between lower SES (↓SES; rate $= 1$) and upper SES (↑SES) human writers. Each point indicates the frequency ratio of a feature in a model's (or human's) output compared to human text from the ↓SES group. These comparisons are presented across four models and three prompts (see Section \ref{['sec:prompts']}). Feature types are color-coded: Biber features are cyan, length-specific features are dark violet, PoS tags are red, and style features are black.
  • Figure 3: Comparison of Linguistic Features on YouTube. This plot displays only the linguistic features, not present in Reddit results, that show a statistically significant difference ($p<0.01$, corrected) between lower SES (↓SES) and upper SES (↑SES) human writers. Each point represents the frequency ratio of a feature relative to the ↓SES human group. Feature types are color-coded: Biber (cyan), length (dark violet), PoS (red), and style (black).
  • Figure 4: Average absolute logarithm of the ratio between model and human text across increasing context.
  • Figure 5: Forest Plots Comparing Linguistic Features of Humans and Models on Reddit; Prompt 1, Full Features. The plots display linguistic features between lower SES (↓SES; rate $=1$) and upper SES (↑SES) human writers. Each point indicates the frequency ratio of a feature in a model's (or human's) output compared to human text from the ↓SES group. These comparisons are shown across four models and three prompts (see Section \ref{['sec:prompts']}). Feature types are color-coded: Biber features are cyan, length-specific features are dark violet, PoS tags are red, and style features are black.
  • ...and 5 more figures