Table of Contents
Fetching ...

Temporal patterns of preferences through Wikipedia editing in different languages

David André Villamil Carrillo, Yérali Gandica

TL;DR

This study analyses over a decade of editorial activity across eleven Wikipedia language editions, representing a diverse set of linguistic and cultural communities, and applies hierarchical clustering with dimensionality reduction via PCA and autoencoders to both static and temporal dimensions of collective behaviour.

Abstract

Temporal editing patterns on Wikipedia provide a unique computational lens to explore cultural dynamics across linguistic communities. This study analyses over a decade of editorial activity (2001-2010) across eleven Wikipedia language editions, representing a diverse set of linguistic and cultural communities. We apply hierarchical clustering with dimensionality reduction via PCA and autoencoders to both static (categorical) and temporal dimensions of collective behaviour. Results reveal that linguistic communities exhibit distinct circadian editing rhythms shaped by cultural and societal factors. Crucially, static and temporal clustering yield substantially different community groupings, demonstrating that time is an essential -- and often neglected -- dimension in cross-cultural computational analyses. These findings contribute to our understanding of how cultural identity manifests in large-scale digital trace data, and offer methodological implications for future studies using online platforms as proxies for collective cultural behaviour.

Temporal patterns of preferences through Wikipedia editing in different languages

TL;DR

This study analyses over a decade of editorial activity across eleven Wikipedia language editions, representing a diverse set of linguistic and cultural communities, and applies hierarchical clustering with dimensionality reduction via PCA and autoencoders to both static and temporal dimensions of collective behaviour.

Abstract

Temporal editing patterns on Wikipedia provide a unique computational lens to explore cultural dynamics across linguistic communities. This study analyses over a decade of editorial activity (2001-2010) across eleven Wikipedia language editions, representing a diverse set of linguistic and cultural communities. We apply hierarchical clustering with dimensionality reduction via PCA and autoencoders to both static (categorical) and temporal dimensions of collective behaviour. Results reveal that linguistic communities exhibit distinct circadian editing rhythms shaped by cultural and societal factors. Crucially, static and temporal clustering yield substantially different community groupings, demonstrating that time is an essential -- and often neglected -- dimension in cross-cultural computational analyses. These findings contribute to our understanding of how cultural identity manifests in large-scale digital trace data, and offer methodological implications for future studies using online platforms as proxies for collective cultural behaviour.
Paper Structure (10 sections, 6 figures, 2 tables)

This paper contains 10 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Average number of edits per hour across all categories for the 11 analysed languages throughout the week, along with the variance for the top 3 categories with the highest activity.
  • Figure 2: Average number of editors per hour across all categories for the 11 analysed languages throughout the week, along with the variance for the top 3 categories with the highest activity.
  • Figure 3: Number of edits per hour over the entire length of the data span for all languages, with Spanish highlighted in red and Portuguese in blue for comparison.
  • Figure 4: Dendrograms of edits and editors for the static clustering of the dataset. On the left (a) the dendrograms of edits: the standardised data is shown prominently, with the non-standardised data in the inset. On the right (b), the dendrograms of editors: the standardised data is also shown in large format, with the non-standardised data displayed in the inset.
  • Figure 5: Average weekly edits activity patterns and hierarchical clustering. Top: Absolute (a) and standardised (b) averaged editor activity across different languages. Bottom: Dendrograms showing hierarchical clustering based on averaged editor activity using PCA (c) and Autoencoder (d).
  • ...and 1 more figures