ArguSense: Argument-Centric Analysis of Online Discourse

Arman Irani; Michalis Faloutsos; Kevin Esterling

ArguSense: Argument-Centric Analysis of Online Discourse

Arman Irani, Michalis Faloutsos, Kevin Esterling

TL;DR

ArguSense presents a comprehensive framework for quantifying dialogical argumentation in online forums by combining (i) aspect-based argument detection, (ii) cross-argument similarity and clustering, (iii) cluster summarization, and (iv) thread-level deliberation modeling via semantically enhanced graphs. The authors apply the framework to a large Reddit GMO debate across four communities over 21 months, uncovering that about 27% of posts contain arguments, with in-favor arguments receiving more upvotes and argumentative posts often being substantial in length. They introduce metrics such as Deliberation Intensity Score (DIS) and PostRank-based argument importance to capture the depth and structure of deliberation beyond raw thread size. The work demonstrates the feasibility and value of a unified, scalable pipeline for analyzing online deliberation, with implications for policymaking, public discourse understanding, and steering constructive dialogue. The study also cautions about ethical considerations and biases inherent in data and modeling choices, emphasizing responsible deployment and potential extensions to additional topics and platforms.

Abstract

How can we model arguments and their dynamics in online forum discussions? The meteoric rise of online forums presents researchers across different disciplines with an unprecedented opportunity: we have access to texts containing discourse between groups of users generated in a voluntary and organic fashion. Most prior work so far has focused on classifying individual monological comments as either argumentative or not argumentative. However, few efforts quantify and describe the dialogical processes between users found in online forum discourse: the structure and content of interpersonal argumentation. Modeling dialogical discourse requires the ability to identify the presence of arguments, group them into clusters, and summarize the content and nature of clusters of arguments within a discussion thread in the forum. In this work, we develop ArguSense, a comprehensive and systematic framework for understanding arguments and debate in online forums. Our framework consists of methods for, among other things: (a) detecting argument topics in an unsupervised manner; (b) describing the structure of arguments within threads with powerful visualizations; and (c) quantifying the content and diversity of threads using argument similarity and clustering algorithms. We showcase our approach by analyzing the discussions of four communities on the Reddit platform over a span of 21 months. Specifically, we analyze the structure and content of threads related to GMOs in forums related to agriculture or farming to demonstrate the value of our framework.

ArguSense: Argument-Centric Analysis of Online Discourse

TL;DR

Abstract

Paper Structure (23 sections, 3 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 3 equations, 9 figures, 3 tables, 1 algorithm.

Introduction
Data and Definitions
Methodology
A. Our Argument Level Methods
Aspect detection.
B. Our Argument Grouping Methods
Argument Similarity.
Argument Clustering.
Summarizing clusters of arguments.
C. Our Thread Level Methods
The deliberation profile of a thread.
Representing threads using graphs.
Thread Deliberation Intensity.
Identifying important arguments within a thread.
Argument Stance Dependence.
...and 8 more sections

Figures (9)

Figure 1: A visual representation of the deliberation of a real Reddit thread around the GMO debate: 18 posts, 13 aspect-based posts, 7 argumentative posts, 4 different aspects, depth of 7, and a fan out of 7. The color of a node corresponds to an aspect and the color of the border to the stance towards that aspect.
Figure 2: Breakdown of Argumentative Posts in our $T_{GMO}$ dataset. Soil Science dominates the argumentative space, but there remains a relatively equal distribution of argumentative stance for all GMO aspects.
Figure 3: A visual representation of our methodology: the major phases and key capabilities.
Figure 4: The CCDF of the length of the posts in number of words for different types argument stance. Interestingly, posts in favor of an aspect tend to be longer than arguments against an aspect, while posts with no argument are the shortest.
Figure 5: Positive stance posts get more upvotes: Upvote probability distribution based upon the stance of the argument. We do not count the default first upvote that every post on Reddit receives.
...and 4 more figures

ArguSense: Argument-Centric Analysis of Online Discourse

TL;DR

Abstract

ArguSense: Argument-Centric Analysis of Online Discourse

Authors

TL;DR

Abstract

Table of Contents

Figures (9)