Syntactic Evolution in Language Usage
Surbhit Kumar
TL;DR
This paper investigates how English syntax evolves across the lifespan by analyzing blogger.com text from 2002–2004 across three age groups. It combines extensive syntactic feature extraction with a PCA-based dimensionality reduction and a two-layer stacked ensemble to forecast age groups, and it benchmarks real blog data against GPT-4 generated text. Findings indicate that real blog text exhibits increasing syntactic complexity with age, though GPT-4 outputs show weaker, less consistent age-related patterns and yield moderate forecasting performance (around 40% on balanced data, ~30% on new GPT-4 text). The work highlights challenges in cross-domain style replication by AI and underscores the need for diverse data and robust modeling to accurately capture demographic-driven language variation in digital communication.
Abstract
This research aims to investigate the dynamic nature of linguistic style throughout various stages of life, from post teenage to old age. By employing linguistic analysis tools and methodologies, the study will delve into the intricacies of how individuals adapt and modify their language use over time. The research uses a data set of blogs from blogger.com from 2004 and focuses on English for syntactic analysis. The findings of this research can have implications for linguistics, psychology, and communication studies, shedding light on the intricate relationship between age and language.
