On Theoretically-Driven LLM Agents for Multi-Dimensional Discourse Analysis
Maciej Uberna, Michał Wawer, Jarosław A. Chudziak, Marcin Koszowy
TL;DR
The paper addresses how reformulation functions in argumentative discourse can be analyzed beyond surface-level paraphrase. It proposes a theoretically grounded multi-agent system (MAS) with Retrieval-Augmented Generation (RAG) and a zero-shot baseline to classify rephrases into a five-category taxonomy (D-I-S-G-O) using a gold-standard US2016 debate corpus. Empirically, the RAG-enhanced MAS significantly outperforms the zero-shot baseline (macro F1 ~ 0.67 vs ~0.27; MCC ~ 0.64 vs ~0.16), with the largest gains in Generalisation and other pragmatic functions, suggesting that explicit theory improves function-aware discourse analysis. The work demonstrates a scalable, interpretable framework for identifying rhetorical strategies in contemporary discourse, with potential applications in misinformation detection and online harms mitigation.
Abstract
Identifying the strategic uses of reformulation in discourse remains a key challenge for computational argumentation. While LLMs can detect surface-level similarity, they often fail to capture the pragmatic functions of rephrasing, such as its role within rhetorical discourse. This paper presents a comparative multi-agent framework designed to quantify the benefits of incorporating explicit theoretical knowledge for this task. We utilise an dataset of annotated political debates to establish a new standard encompassing four distinct rephrase functions: Deintensification, Intensification, Specification, Generalisation, and Other, which covers all remaining types (D-I-S-G-O). We then evaluate two parallel LLM-based agent systems: one enhanced by argumentation theory via Retrieval-Augmented Generation (RAG), and an identical zero-shot baseline. The results reveal a clear performance gap: the RAG-enhanced agents substantially outperform the baseline across the board, with particularly strong advantages in detecting Intensification and Generalisation context, yielding an overall Macro F1-score improvement of nearly 30\%. Our findings provide evidence that theoretical grounding is not only beneficial but essential for advancing beyond mere paraphrase detection towards function-aware analysis of argumentative discourse. This comparative multi-agent architecture represents a step towards scalable, theoretically informed computational tools capable of identifying rhetorical strategies in contemporary discourse.
