Table of Contents
Fetching ...

Curious Rhythms: Temporal Regularities of Wikipedia Consumption

Tiziano Piccardi, Martin Gerlach, Robert West

TL;DR

It is shown that even after removing the global pattern of day-night alternation, the consumption habits of individual articles maintain strong diurnal regularities, and the prototypical shapes of consumption patterns are characterized, emphasizing Wikipedia's role as a rich knowledge base that fulfills information needs spread throughout the day.

Abstract

Wikipedia, in its role as the world's largest encyclopedia, serves a broad range of information needs. Although previous studies have noted that Wikipedia users' information needs vary throughout the day, there is to date no large-scale, quantitative study of the underlying dynamics. The present paper fills this gap by investigating temporal regularities in daily consumption patterns in a large-scale analysis of billions of timezone-corrected page requests mined from English Wikipedia's server logs, with the goal of investigating how context and time relate to the kind of information consumed. First, we show that even after removing the global pattern of day-night alternation, the consumption habits of individual articles maintain strong diurnal regularities. Then, we characterize the prototypical shapes of consumption patterns, finding a particularly strong distinction between articles preferred during the evening/night and articles preferred during working hours. Finally, we investigate topical and contextual correlates of Wikipedia articles' access rhythms, finding that article topic, reader country, and access device (mobile vs. desktop) are all important predictors of daily attention patterns. These findings shed new light on how humans seek information on the Web by focusing on Wikipedia as one of the largest open platforms for knowledge and learning, emphasizing Wikipedia's role as a rich knowledge base that fulfills information needs spread throughout the day, with implications for understanding information seeking across the globe and for designing appropriate information systems.

Curious Rhythms: Temporal Regularities of Wikipedia Consumption

TL;DR

It is shown that even after removing the global pattern of day-night alternation, the consumption habits of individual articles maintain strong diurnal regularities, and the prototypical shapes of consumption patterns are characterized, emphasizing Wikipedia's role as a rich knowledge base that fulfills information needs spread throughout the day.

Abstract

Wikipedia, in its role as the world's largest encyclopedia, serves a broad range of information needs. Although previous studies have noted that Wikipedia users' information needs vary throughout the day, there is to date no large-scale, quantitative study of the underlying dynamics. The present paper fills this gap by investigating temporal regularities in daily consumption patterns in a large-scale analysis of billions of timezone-corrected page requests mined from English Wikipedia's server logs, with the goal of investigating how context and time relate to the kind of information consumed. First, we show that even after removing the global pattern of day-night alternation, the consumption habits of individual articles maintain strong diurnal regularities. Then, we characterize the prototypical shapes of consumption patterns, finding a particularly strong distinction between articles preferred during the evening/night and articles preferred during working hours. Finally, we investigate topical and contextual correlates of Wikipedia articles' access rhythms, finding that article topic, reader country, and access device (mobile vs. desktop) are all important predictors of daily attention patterns. These findings shed new light on how humans seek information on the Web by focusing on Wikipedia as one of the largest open platforms for knowledge and learning, emphasizing Wikipedia's role as a rich knowledge base that fulfills information needs spread throughout the day, with implications for understanding information seeking across the globe and for designing appropriate information systems.
Paper Structure (15 sections, 2 equations, 14 figures)

This paper contains 15 sections, 2 equations, 14 figures.

Figures (14)

  • Figure 1: Top: Total number of pageloads from external origin by hour of the week, averaged over four weeks. Bottom: Idem, stratified by desktop and mobile.
  • Figure 2: Top/blue: Contribution of each frequency to baseline rhythm $\Pr(h)$ of Wikipedia access volume (measured as the fraction of total variance explained). Bottom/red: Contribution of each frequency to article- specific divergence $D_a(h)$ from baseline rhythm (Eq. \ref{['eqn:baseline_removal']}) (computed per article $a$, then averaged over articles).
  • Figure 3: Daily pattern
  • Figure 4: Divergence
  • Figure 6: Four principal components of the daily access volume time series, capturing 73.6% of total variance.
  • ...and 9 more figures