Table of Contents
Fetching ...

A Discourse Analysis Framework for Legislative and Social Media Debates

Arman Irani, Ju Yeon Park, Kevin Esterling, Michalis Faloutsos

TL;DR

DALiSM introduces a data-driven, argument-centric framework for modeling deliberation across offline and online debates, combining argument detection, extraction, thematic similarity, and six novel metrics to quantify deliberation dynamics. It pairs a sliding-window argument extraction with all-mpnet-v2 embeddings and HDBSCAN-based clustering to derive structural narratives, while Debater Diversity and Argumentativeness capture participation. The core contribution is the Deliberation Intensity Score, a robust, tunable measure of diversity and engagement that facilitates cross-platform comparisons, demonstrated on U.S. congressional hearings and Reddit discussions on abortion and GMOs. The interactive DALiSM platform enables researchers to visualize argument flow, cluster narratives, and summarize key points, offering a scalable tool for studying deliberative democracy in practice with implications for policy analysis and computational social science.

Abstract

How can we capture the dynamics of deliberation in a debate? In an increasingly divided and misinformed world, understanding the relationship between who is arguing and what they are arguing about is becoming critical for fostering a meaningful exchange of ideas. Given the vast array of available platforms for people to express their viewpoints and deliberate on issues, how can we develop methods to accurately analyze these processes? Luckily, there is an abundance of debate data available, ranging from: (a) formal proceedings, such as committee hearings in legislatures, to (b) online discussion forums, such as Reddit. Here we introduce DALiSM, a data-driven argument-centric framework, to analyze discourse dynamics in diverse and multi-party spaces at scale. We develop methods to harness and extend the state-of-the-art in computational argumentation for: (a) identifying arguments from long-form raw texts, (b) calculating the intensity of deliberation, and (c) modeling the evolution of discourse over time. We deploy our framework as a comprehensive and interactive dashboard for dynamically viewing the outputs of DALiSM to clearly understand the nature of a discourse event. To showcase the importance and utility of DALiSM, we apply our framework to U.S. congressional committee hearings from 2005 to 2023 (109th to 117th Congresses), and to selected Reddit communities from 2008 to 2023. This case study reveals substantive insights into deliberative behavior in online and offline spaces.

A Discourse Analysis Framework for Legislative and Social Media Debates

TL;DR

DALiSM introduces a data-driven, argument-centric framework for modeling deliberation across offline and online debates, combining argument detection, extraction, thematic similarity, and six novel metrics to quantify deliberation dynamics. It pairs a sliding-window argument extraction with all-mpnet-v2 embeddings and HDBSCAN-based clustering to derive structural narratives, while Debater Diversity and Argumentativeness capture participation. The core contribution is the Deliberation Intensity Score, a robust, tunable measure of diversity and engagement that facilitates cross-platform comparisons, demonstrated on U.S. congressional hearings and Reddit discussions on abortion and GMOs. The interactive DALiSM platform enables researchers to visualize argument flow, cluster narratives, and summarize key points, offering a scalable tool for studying deliberative democracy in practice with implications for policy analysis and computational social science.

Abstract

How can we capture the dynamics of deliberation in a debate? In an increasingly divided and misinformed world, understanding the relationship between who is arguing and what they are arguing about is becoming critical for fostering a meaningful exchange of ideas. Given the vast array of available platforms for people to express their viewpoints and deliberate on issues, how can we develop methods to accurately analyze these processes? Luckily, there is an abundance of debate data available, ranging from: (a) formal proceedings, such as committee hearings in legislatures, to (b) online discussion forums, such as Reddit. Here we introduce DALiSM, a data-driven argument-centric framework, to analyze discourse dynamics in diverse and multi-party spaces at scale. We develop methods to harness and extend the state-of-the-art in computational argumentation for: (a) identifying arguments from long-form raw texts, (b) calculating the intensity of deliberation, and (c) modeling the evolution of discourse over time. We deploy our framework as a comprehensive and interactive dashboard for dynamically viewing the outputs of DALiSM to clearly understand the nature of a discourse event. To showcase the importance and utility of DALiSM, we apply our framework to U.S. congressional committee hearings from 2005 to 2023 (109th to 117th Congresses), and to selected Reddit communities from 2008 to 2023. This case study reveals substantive insights into deliberative behavior in online and offline spaces.
Paper Structure (20 sections, 8 equations, 7 figures)

This paper contains 20 sections, 8 equations, 7 figures.

Figures (7)

  • Figure 1: DALiSM platform visualizing Reddit discussions: Participants on the y-axis, time on the x-axis. Each colored box shows an argument, with matching colors representing similar points. The system summarizes content and measures discussion quality. Usernames are blurred for privacy.
  • Figure 2: This figure shows an example of our automated sliding window Argument Detection process. In this example, the second window has a higher argument confidence. Therefore we assume this text unit is more of an argument than the first window.
  • Figure 3: The evolution of discourse on Reddit and congressional hearings regarding the topic of abortion. We observe more volatility in Reddit discussions, especially in the later stages, with an overall smaller semantic difference than the trend observed in congressional hearings.
  • Figure 4: Establishing an informative deliberation profile with our metrics: We show the average and the variance of our metrics for each deliberation. Congressional hearings exhibit much higher Argument Diversity and Narrative Coherence than Reddit threads, which aligns with the nature of the two venues of deliberation.
  • Figure 5: Do legislators and witnesses use similar arguments, and how is this affected by which party is in control? This figure quantifies the difference in argument-level agreement in congressional hearings when the legislator is in the "party of power" versus when they are not. A positive value indicates greater similarity when the party is in control of the chamber, possibly implying that legislators in power call witnesses who agree with them.
  • ...and 2 more figures