A Discourse Analysis Framework for Legislative and Social Media Debates
Arman Irani, Ju Yeon Park, Kevin Esterling, Michalis Faloutsos
TL;DR
DALiSM introduces a data-driven, argument-centric framework for modeling deliberation across offline and online debates, combining argument detection, extraction, thematic similarity, and six novel metrics to quantify deliberation dynamics. It pairs a sliding-window argument extraction with all-mpnet-v2 embeddings and HDBSCAN-based clustering to derive structural narratives, while Debater Diversity and Argumentativeness capture participation. The core contribution is the Deliberation Intensity Score, a robust, tunable measure of diversity and engagement that facilitates cross-platform comparisons, demonstrated on U.S. congressional hearings and Reddit discussions on abortion and GMOs. The interactive DALiSM platform enables researchers to visualize argument flow, cluster narratives, and summarize key points, offering a scalable tool for studying deliberative democracy in practice with implications for policy analysis and computational social science.
Abstract
How can we capture the dynamics of deliberation in a debate? In an increasingly divided and misinformed world, understanding the relationship between who is arguing and what they are arguing about is becoming critical for fostering a meaningful exchange of ideas. Given the vast array of available platforms for people to express their viewpoints and deliberate on issues, how can we develop methods to accurately analyze these processes? Luckily, there is an abundance of debate data available, ranging from: (a) formal proceedings, such as committee hearings in legislatures, to (b) online discussion forums, such as Reddit. Here we introduce DALiSM, a data-driven argument-centric framework, to analyze discourse dynamics in diverse and multi-party spaces at scale. We develop methods to harness and extend the state-of-the-art in computational argumentation for: (a) identifying arguments from long-form raw texts, (b) calculating the intensity of deliberation, and (c) modeling the evolution of discourse over time. We deploy our framework as a comprehensive and interactive dashboard for dynamically viewing the outputs of DALiSM to clearly understand the nature of a discourse event. To showcase the importance and utility of DALiSM, we apply our framework to U.S. congressional committee hearings from 2005 to 2023 (109th to 117th Congresses), and to selected Reddit communities from 2008 to 2023. This case study reveals substantive insights into deliberative behavior in online and offline spaces.
