"I Never Said That": A dataset, taxonomy and baselines on response clarity classification
Konstantinos Thomas, Giorgos Filandrianos, Maria Lymperaiou, Chrysoula Zerva, Giorgos Stamou
TL;DR
This work defines a novel task: automatically evaluating the clarity of responses in political interviews. It introduces a two-level taxonomy (3 high-level classes with 9 evasion sub-categories) and a 3,445 QA-pair dataset built from presidential interviews using ChatGPT-assisted decomposition and human validation. Through extensive experiments across encoders, LLMs, prompting strategies, and LoRA-based tuning, the authors show that finetuned LLMs and evasion-based labeling yield strong performance, with model knowledge and grounding considerations shaping results. The study offers a resource and framework enabling scalable political discourse analysis and long-context reasoning in NLP, while acknowledging limitations and outlining directions for multilingual generalization and deeper discourse studies.
Abstract
Equivocation and ambiguity in public speech are well-studied discourse phenomena, especially in political science and analysis of political interviews. Inspired by the well-grounded theory on equivocation, we aim to resolve the closely related problem of response clarity in questions extracted from political interviews, leveraging the capabilities of Large Language Models (LLMs) and human expertise. To this end, we introduce a novel taxonomy that frames the task of detecting and classifying response clarity and a corresponding clarity classification dataset which consists of question-answer (QA) pairs drawn from political interviews and annotated accordingly. Our proposed two-level taxonomy addresses the clarity of a response in terms of the information provided for a given question (high-level) and also provides a fine-grained taxonomy of evasion techniques that relate to unclear, ambiguous responses (lower-level). We combine ChatGPT and human annotators to collect, validate and annotate discrete QA pairs from political interviews, to be used for our newly introduced response clarity task. We provide a detailed analysis and conduct several experiments with different model architectures, sizes and adaptation methods to gain insights and establish new baselines over the proposed dataset and task.
