Table of Contents
Fetching ...

The Evolution of Language in Social Media Comments

Niccolò Di Marco, Edoardo Loru, Anita Bonetti, Alessandra Olga Grazia Serra, Matteo Cinelli, Walter Quattrociocchi

TL;DR

This study investigates the linguistic characteristics of user comments over 34 years, focusing on their complexity and temporal shifts, and reflects a broader, universal pattern of human behaviour, suggesting intrinsic linguistic tendencies of users when interacting online.

Abstract

Understanding the impact of digital platforms on user behavior presents foundational challenges, including issues related to polarization, misinformation dynamics, and variation in news consumption. Comparative analyses across platforms and over different years can provide critical insights into these phenomena. This study investigates the linguistic characteristics of user comments over 34 years, focusing on their complexity and temporal shifts. Utilizing a dataset of approximately 300 million English comments from eight diverse platforms and topics, we examine the vocabulary size and linguistic richness of user communications and their evolution over time. Our findings reveal consistent patterns of complexity across social media platforms and topics, characterized by a nearly universal reduction in text length, diminished lexical richness, but decreased repetitiveness. Despite these trends, users consistently introduce new words into their comments at a nearly constant rate. This analysis underscores that platforms only partially influence the complexity of user comments. Instead, it reflects a broader, universal pattern of human behaviour, suggesting intrinsic linguistic tendencies of users when interacting online.

The Evolution of Language in Social Media Comments

TL;DR

This study investigates the linguistic characteristics of user comments over 34 years, focusing on their complexity and temporal shifts, and reflects a broader, universal pattern of human behaviour, suggesting intrinsic linguistic tendencies of users when interacting online.

Abstract

Understanding the impact of digital platforms on user behavior presents foundational challenges, including issues related to polarization, misinformation dynamics, and variation in news consumption. Comparative analyses across platforms and over different years can provide critical insights into these phenomena. This study investigates the linguistic characteristics of user comments over 34 years, focusing on their complexity and temporal shifts. Utilizing a dataset of approximately 300 million English comments from eight diverse platforms and topics, we examine the vocabulary size and linguistic richness of user communications and their evolution over time. Our findings reveal consistent patterns of complexity across social media platforms and topics, characterized by a nearly universal reduction in text length, diminished lexical richness, but decreased repetitiveness. Despite these trends, users consistently introduce new words into their comments at a nearly constant rate. This analysis underscores that platforms only partially influence the complexity of user comments. Instead, it reflects a broader, universal pattern of human behaviour, suggesting intrinsic linguistic tendencies of users when interacting online.
Paper Structure (21 sections, 7 equations, 10 figures, 5 tables)

This paper contains 21 sections, 7 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: CCDF of the distributions of number of $(a)$ tokens and $(b)$ types used by each user.
  • Figure 2: Distribution of the number of types (i.e., unique words) employed by users according to their activity class, which is determined by the number of comments they left in each specific dataset.
  • Figure 3: Distribution of the area under the curves determined by users' progressive exploration of their vocabulary.
  • Figure 4: Distribution of $(a)$ gzip complexity and $(b)$ K-complexity for users having at least 20 comments. For the larger dataset, we selected a sample of 50000 users to compute $K-$complexity. For visual reasons, we add 1 to all the values of $K-$complexity.
  • Figure 5: Evolution of the mean number of types in each dataset. The smooth curves are obtained using a loess regression
  • ...and 5 more figures