Table of Contents
Fetching ...

Social Perceptions of English Spelling Variation on Twitter: A Comparative Analysis of Human and LLM Responses

Dong Nguyen, Laura Rosseel

TL;DR

The paper investigates the social meanings attached to English spelling variation on Twitter and whether large language models mirror human judgments. It adapts a sociolinguistic speaker evaluation paradigm to controlled, meaning-matched tweet pairs and compares human ratings with a broad set of LLMs under paired and independent prompting. The study finds generally strong human–LLM alignment for formality, carefulness, and age, but notes notable differences in rating distributions and across spelling variation types, influenced by prompting strategies. It demonstrates the potential of LLMs as a tool for sociolinguistic data collection and pilot testing, while highlighting limitations and the need for human validation and careful ethical consideration.

Abstract

Spelling variation (e.g. funnnn vs. fun) can influence the social perception of texts and their writers: we often have various associations with different forms of writing (is the text informal? does the writer seem young?). In this study, we focus on the social perception of spelling variation in online writing in English and study to what extent this perception is aligned between humans and large language models (LLMs). Building on sociolinguistic methodology, we compare LLM and human ratings on three key social attributes of spelling variation (formality, carefulness, age). We find generally strong correlations in the ratings between humans and LLMs. However, notable differences emerge when we analyze the distribution of ratings and when comparing between different types of spelling variation.

Social Perceptions of English Spelling Variation on Twitter: A Comparative Analysis of Human and LLM Responses

TL;DR

The paper investigates the social meanings attached to English spelling variation on Twitter and whether large language models mirror human judgments. It adapts a sociolinguistic speaker evaluation paradigm to controlled, meaning-matched tweet pairs and compares human ratings with a broad set of LLMs under paired and independent prompting. The study finds generally strong human–LLM alignment for formality, carefulness, and age, but notes notable differences in rating distributions and across spelling variation types, influenced by prompting strategies. It demonstrates the potential of LLMs as a tool for sociolinguistic data collection and pilot testing, while highlighting limitations and the need for human validation and careful ethical consideration.

Abstract

Spelling variation (e.g. funnnn vs. fun) can influence the social perception of texts and their writers: we often have various associations with different forms of writing (is the text informal? does the writer seem young?). In this study, we focus on the social perception of spelling variation in online writing in English and study to what extent this perception is aligned between humans and large language models (LLMs). Building on sociolinguistic methodology, we compare LLM and human ratings on three key social attributes of spelling variation (formality, carefulness, age). We find generally strong correlations in the ratings between humans and LLMs. However, notable differences emerge when we analyze the distribution of ratings and when comparing between different types of spelling variation.

Paper Structure

This paper contains 25 sections, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Density plots of the ratings provided in the paired setup by humans and GPT-5 across three attributes: formality, carefulness, and age. The plots are divided based on the type of tweets: those containing conventional (C) versus unconventional (U) spelling variants. GPT-5's ratings display a much greater separation between conventional and unconventional spellings compared to human ratings. Furthermore, GPT-5 has a tendency to respond more frequently with certain numbers, particularly multiples of 5 and 10.
  • Figure 2: A boxplot of the Spearman correlations (rating differences) across the models in the paired setting, per attribute.
  • Figure 3: Start instructions of the Prolific task.
  • Figure 4: An example informality rating task.