Table of Contents
Fetching ...

Highly engaging events reveal semantic and temporal compression in online community discourse

Antonio Desiderio, Anna Mancini, Giulio Cimini, Riccardo Di Clemente

TL;DR

Reddit conversation data is leveraged, exploiting its community-based structure, to elucidate how offline events influence online user interactions and behavior, representing a fingerprint of how online dynamics change in response to real-world occurrences.

Abstract

People nowadays express their opinions in online spaces, using different forms of interactions such as posting, sharing and discussing with one another. How do these digital traces change in response to events happening in the real world? We leverage Reddit conversation data, exploiting its community-based structure, to elucidate how offline events influence online user interactions and behavior. Online conversations, as posts and comments, are analysed along their temporal and semantic dimensions. Conversations tend to become repetitive with a more limited vocabulary, develop at a faster pace, and feature heightened emotions. As the event approaches, the shifts occurring in conversations are reflected in the users' dynamics. Users become more active and they exchange information with a growing audience, despite using a less rich vocabulary and repetitive messages. The recurring patterns we discovered are persistent across a wide range of events and several contexts, representing a fingerprint of how online dynamics change in response to real-world occurrences.

Highly engaging events reveal semantic and temporal compression in online community discourse

TL;DR

Reddit conversation data is leveraged, exploiting its community-based structure, to elucidate how offline events influence online user interactions and behavior, representing a fingerprint of how online dynamics change in response to real-world occurrences.

Abstract

People nowadays express their opinions in online spaces, using different forms of interactions such as posting, sharing and discussing with one another. How do these digital traces change in response to events happening in the real world? We leverage Reddit conversation data, exploiting its community-based structure, to elucidate how offline events influence online user interactions and behavior. Online conversations, as posts and comments, are analysed along their temporal and semantic dimensions. Conversations tend to become repetitive with a more limited vocabulary, develop at a faster pace, and feature heightened emotions. As the event approaches, the shifts occurring in conversations are reflected in the users' dynamics. Users become more active and they exchange information with a growing audience, despite using a less rich vocabulary and repetitive messages. The recurring patterns we discovered are persistent across a wide range of events and several contexts, representing a fingerprint of how online dynamics change in response to real-world occurrences.
Paper Structure (23 sections, 2 equations, 34 figures, 6 tables)

This paper contains 23 sections, 2 equations, 34 figures, 6 tables.

Figures (34)

  • Figure 1: Burst of Activity and Conversation Characterization In subplots A-B we apply a 7-day moving average to the time series. A) Number of posts (solid line) and comments (dashed lines) for the U.S. politics community (upper panel) and European community (lower panel). The grey vertical dashed-dotted lines mark the highly engaging events and correspond to the peaks of the signals. B) Number of posts compared to the Google Trends for the NBA community (upper panel) and NFL community (lower panel). C) Radar plots showing, for each subreddit, the average Z-scored hourly activity in the week before (dashed line) and that of the event (solid line), with the shaded area representing the standard deviation. For the European community, the Amsterdam timezone is selected, while the US/Eastern timezone is employed for all other communities. D) Schematic representation of how we characterise a conversation. For each post we capture the temporal dimension as the time series extracted by counting the comments underneath within a $\Delta t=5$ minutes time interval, and the semantic dimension by merging all the comments into a single text, whose compression is obtained as the ratio between the number of unique patterns of words (in red) and of all words (unique and repeated).
  • Figure 2: Temporal Dimension. A) Average Dynamic Time Warping (solid) and coherence (dashed) distances between the conversations of a week and of the previous one, for each subreddit. B) Average reply speeds, for each subreddit. In all panels, the grey vertical dashed-dotted lines mark the events.
  • Figure 3: Semantic Dimension. A) Percentage change of compression between the week associated with the event (darker) and the week before (lighter) for each subreddit. The grey shaded vertical area is the standard deviation of the mean change between one week and the preceding week. B) Jaccard index among statistically relevant bi-grams between all weeks, the lighter the color the more the weeks are dissimilar. Events are marked with grey lines. C) Emotion variation for each subreddit between consecutive weeks. The triangles mark the variation associated to the events.
  • Figure 4: Users' dynamics. In the following subplots the data used on the left panels are of the users active on the subreddit r/NBA during the NBA Trades, while on the right of the users on r/politics during the U.S. 2020 election. A) The central panels show the relation between the frequency of activity of each user and the number of interacting peers (the degree). The marginal plots report the survival function of each variable for the two weeks. B) The density plots show the variations of the peers' degree and semantic diversity. C) The panels show the relation between user's compression and frequency of activity. Marginal plots report the survival function of each variable for the two weeks.
  • Figure S1: Daily Z-score variation of Post Activity in Subreddits. The solid line represents the daily Z-score variation of the number of posts for the following subreddits: U.S. politics (panel A), European (panel B), NBA (panel C), and NFL (panel D). The grey vertical dashed-dotted lines indicate highly engaging events, which correspond to the peaks in the Z-score variation, signifying increased subreddit activity during these times.
  • ...and 29 more figures