AQuA -- Combining Experts' and Non-Experts' Views To Assess Deliberation Quality in Online Discussions Using LLMs
Maike Behrendt, Stefan Sylvius Wagner, Marc Ziegele, Lena Wilms, Anke Stoll, Dominique Heinbach, Stefan Harmeling
TL;DR
AQuA addresses the automated assessment of deliberative quality in online political discussions by combining predictions from 20 adapter-based deliberation facets into a single, interpretable score. The method learns adapter-specific predictions for each facet, then weights them using correlations between expert annotations and crowd judgments to form a normalized score s_AQuA(x) in the range [0,5]. The approach demonstrates transferability to unseen datasets (SOCC and Europolis) and yields strong alignment with theoretical expectations about deliberation indicators, while enabling transparent, component-wise reasoning. Code and adapter weights are made available, highlighting practical applicability for large-scale analysis of online discourse and policy-relevant insights for researchers and practitioners. The work advances computational social science by integrating expert and non-expert perspectives into an interpretable, scalable framework for measuring deliberative quality.
Abstract
Measuring the quality of contributions in political online discussions is crucial in deliberation research and computer science. Research has identified various indicators to assess online discussion quality, and with deep learning advancements, automating these measures has become feasible. While some studies focus on analyzing specific quality indicators, a comprehensive quality score incorporating various deliberative aspects is often preferred. In this work, we introduce AQuA, an additive score that calculates a unified deliberative quality score from multiple indices for each discussion post. Unlike other singular scores, AQuA preserves information on the deliberative aspects present in comments, enhancing model transparency. We develop adapter models for 20 deliberative indices, and calculate correlation coefficients between experts' annotations and the perceived deliberativeness by non-experts to weigh the individual indices into a single deliberative score. We demonstrate that the AQuA score can be computed easily from pre-trained adapters and aligns well with annotations on other datasets that have not be seen during training. The analysis of experts' vs. non-experts' annotations confirms theoretical findings in the social science literature.
