Table of Contents
Fetching ...

AQuA -- Combining Experts' and Non-Experts' Views To Assess Deliberation Quality in Online Discussions Using LLMs

Maike Behrendt, Stefan Sylvius Wagner, Marc Ziegele, Lena Wilms, Anke Stoll, Dominique Heinbach, Stefan Harmeling

TL;DR

AQuA addresses the automated assessment of deliberative quality in online political discussions by combining predictions from 20 adapter-based deliberation facets into a single, interpretable score. The method learns adapter-specific predictions for each facet, then weights them using correlations between expert annotations and crowd judgments to form a normalized score s_AQuA(x) in the range [0,5]. The approach demonstrates transferability to unseen datasets (SOCC and Europolis) and yields strong alignment with theoretical expectations about deliberation indicators, while enabling transparent, component-wise reasoning. Code and adapter weights are made available, highlighting practical applicability for large-scale analysis of online discourse and policy-relevant insights for researchers and practitioners. The work advances computational social science by integrating expert and non-expert perspectives into an interpretable, scalable framework for measuring deliberative quality.

Abstract

Measuring the quality of contributions in political online discussions is crucial in deliberation research and computer science. Research has identified various indicators to assess online discussion quality, and with deep learning advancements, automating these measures has become feasible. While some studies focus on analyzing specific quality indicators, a comprehensive quality score incorporating various deliberative aspects is often preferred. In this work, we introduce AQuA, an additive score that calculates a unified deliberative quality score from multiple indices for each discussion post. Unlike other singular scores, AQuA preserves information on the deliberative aspects present in comments, enhancing model transparency. We develop adapter models for 20 deliberative indices, and calculate correlation coefficients between experts' annotations and the perceived deliberativeness by non-experts to weigh the individual indices into a single deliberative score. We demonstrate that the AQuA score can be computed easily from pre-trained adapters and aligns well with annotations on other datasets that have not be seen during training. The analysis of experts' vs. non-experts' annotations confirms theoretical findings in the social science literature.

AQuA -- Combining Experts' and Non-Experts' Views To Assess Deliberation Quality in Online Discussions Using LLMs

TL;DR

AQuA addresses the automated assessment of deliberative quality in online political discussions by combining predictions from 20 adapter-based deliberation facets into a single, interpretable score. The method learns adapter-specific predictions for each facet, then weights them using correlations between expert annotations and crowd judgments to form a normalized score s_AQuA(x) in the range [0,5]. The approach demonstrates transferability to unseen datasets (SOCC and Europolis) and yields strong alignment with theoretical expectations about deliberation indicators, while enabling transparent, component-wise reasoning. Code and adapter weights are made available, highlighting practical applicability for large-scale analysis of online discourse and policy-relevant insights for researchers and practitioners. The work advances computational social science by integrating expert and non-expert perspectives into an interpretable, scalable framework for measuring deliberative quality.

Abstract

Measuring the quality of contributions in political online discussions is crucial in deliberation research and computer science. Research has identified various indicators to assess online discussion quality, and with deep learning advancements, automating these measures has become feasible. While some studies focus on analyzing specific quality indicators, a comprehensive quality score incorporating various deliberative aspects is often preferred. In this work, we introduce AQuA, an additive score that calculates a unified deliberative quality score from multiple indices for each discussion post. Unlike other singular scores, AQuA preserves information on the deliberative aspects present in comments, enhancing model transparency. We develop adapter models for 20 deliberative indices, and calculate correlation coefficients between experts' annotations and the perceived deliberativeness by non-experts to weigh the individual indices into a single deliberative score. We demonstrate that the AQuA score can be computed easily from pre-trained adapters and aligns well with annotations on other datasets that have not be seen during training. The analysis of experts' vs. non-experts' annotations confirms theoretical findings in the social science literature.
Paper Structure (22 sections, 7 equations, 3 figures, 5 tables)

This paper contains 22 sections, 7 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: AQuA calculates a single score for deliberativeness from weighted adapter predictions on 20 different deliberative aspects. The adapter predictions are weighted by the correlation coefficients between each deliberative aspect and the perception of crowd workers about whether a comment is deliberative or not. The normalized score can then be used to compare the deliberative quality of individual comments.
  • Figure 2: For the individual adapter predictions, we use a Transformer based model with adapter layers inserted after the feed forward layer of the Transformer as proposed by pfeiffer-etal-2021-adapterfusion.
  • Figure 3: Europolis. AQuA scores (y-axis) vs the comment length (x-axis, word count) rule out that comment length alone is a factor for a high AQuA score.