Table of Contents
Fetching ...

Characterizing the Structure of Online Conversations Across Reddit

Yulin Yu, Julie Jiang, Paramveer Dhillon

TL;DR

A comprehensive statistical analysis of a year's worth of Reddit data is conducted, revealing that both local and global features contribute significantly to explaining structural variation in discussion trees, but local features collectively have a greater impact, accounting for a larger proportion of variation in the width, depth, and size of discussion trees.

Abstract

The proliferation of social media platforms has afforded social scientists unprecedented access to vast troves of data on human interactions, facilitating the study of online behavior at an unparalleled scale. These platforms typically structure conversations as threads, forming tree-like structures known as "discussion trees." This paper examines the structural properties of online discussions on Reddit by analyzing both global (community-level) and local (post-level) attributes of these discussion trees. We conduct a comprehensive statistical analysis of a year's worth of Reddit data, encompassing a quarter of a million posts and several million comments. Our primary objective is to disentangle the relative impacts of global and local properties and evaluate how specific features shape discussion tree structures. The results reveal that both local and global features contribute significantly to explaining structural variation in discussion trees. However, local features, such as post content and sentiment, collectively have a greater impact, accounting for a larger proportion of variation in the width, depth, and size of discussion trees. Our analysis also uncovers considerable heterogeneity in the impact of various features on discussion structures. Notably, certain global features play crucial roles in determining specific discussion tree properties. These features include the subreddit's topic, age, popularity, and content redundancy. For instance, posts in subreddits focused on politics, sports, and current events tend to generate deeper and wider discussion trees. This research enhances our understanding of online conversation dynamics and offers valuable insights for both content creators and platform designers. By elucidating the factors that shape online discussions, our work contributes to ongoing efforts to improve the quality and effectiveness of digital discourse.

Characterizing the Structure of Online Conversations Across Reddit

TL;DR

A comprehensive statistical analysis of a year's worth of Reddit data is conducted, revealing that both local and global features contribute significantly to explaining structural variation in discussion trees, but local features collectively have a greater impact, accounting for a larger proportion of variation in the width, depth, and size of discussion trees.

Abstract

The proliferation of social media platforms has afforded social scientists unprecedented access to vast troves of data on human interactions, facilitating the study of online behavior at an unparalleled scale. These platforms typically structure conversations as threads, forming tree-like structures known as "discussion trees." This paper examines the structural properties of online discussions on Reddit by analyzing both global (community-level) and local (post-level) attributes of these discussion trees. We conduct a comprehensive statistical analysis of a year's worth of Reddit data, encompassing a quarter of a million posts and several million comments. Our primary objective is to disentangle the relative impacts of global and local properties and evaluate how specific features shape discussion tree structures. The results reveal that both local and global features contribute significantly to explaining structural variation in discussion trees. However, local features, such as post content and sentiment, collectively have a greater impact, accounting for a larger proportion of variation in the width, depth, and size of discussion trees. Our analysis also uncovers considerable heterogeneity in the impact of various features on discussion structures. Notably, certain global features play crucial roles in determining specific discussion tree properties. These features include the subreddit's topic, age, popularity, and content redundancy. For instance, posts in subreddits focused on politics, sports, and current events tend to generate deeper and wider discussion trees. This research enhances our understanding of online conversation dynamics and offers valuable insights for both content creators and platform designers. By elucidating the factors that shape online discussions, our work contributes to ongoing efforts to improve the quality and effectiveness of digital discourse.
Paper Structure (28 sections, 2 equations, 7 figures, 4 tables)

This paper contains 28 sections, 2 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Illustration of a Reddit post and its comment tree. In this paper, we seek to explain the structure of these discussion/comment trees using local and global properties of the post title. Note: 1) A given post on Reddit can be upvoted or downvoted. The karma of a post is the difference between the number of upvotes and downvotes. 2) A Reddit post is a submission made by a user in a subreddit. A post can, in turn, attract comments.
  • Figure 2: Empirical distribution of two global features: "average quality of posts" ($\mu$=59.6, $\sigma$=134.1, min=1.1, max=1634.7, median= 21.1) and the " redundancy of content" ($\mu$=.62, $\sigma$=.05, min=.45, max=.96, median= .61) .
  • Figure 3: Empirical distribution of the non-binary local features: " quality" ($\mu$=68, $\sigma$=696.9, min=0, max=88908, median= 6), " length" ($\mu$=50.5, $\sigma$=37.7, min=1, max=341, median= 41), and " sentiment" ($\mu$=.055, $\sigma$=.322, min=-.987, max=.999, median= .000).
  • Figure 4: Illustration of a discussion along with its various structural properties.
  • Figure 5: Empirical distribution of the various structural properties of the discussion trees. " Size" ($\mu$=14.3, $\sigma$=59.7, min=1, max=14001, median=6), " Width" ($\mu$=6.24, $\sigma$=29.90, min=1, max=9499, median=3), " Depth" ($\mu$=3.38, $\sigma$=3.32, min=1, max=828, median=3).
  • ...and 2 more figures