"I Never Said That": A dataset, taxonomy and baselines on response clarity classification

Konstantinos Thomas; Giorgos Filandrianos; Maria Lymperaiou; Chrysoula Zerva; Giorgos Stamou

"I Never Said That": A dataset, taxonomy and baselines on response clarity classification

Konstantinos Thomas, Giorgos Filandrianos, Maria Lymperaiou, Chrysoula Zerva, Giorgos Stamou

TL;DR

This work defines a novel task: automatically evaluating the clarity of responses in political interviews. It introduces a two-level taxonomy (3 high-level classes with 9 evasion sub-categories) and a 3,445 QA-pair dataset built from presidential interviews using ChatGPT-assisted decomposition and human validation. Through extensive experiments across encoders, LLMs, prompting strategies, and LoRA-based tuning, the authors show that finetuned LLMs and evasion-based labeling yield strong performance, with model knowledge and grounding considerations shaping results. The study offers a resource and framework enabling scalable political discourse analysis and long-context reasoning in NLP, while acknowledging limitations and outlining directions for multilingual generalization and deeper discourse studies.

Abstract

Equivocation and ambiguity in public speech are well-studied discourse phenomena, especially in political science and analysis of political interviews. Inspired by the well-grounded theory on equivocation, we aim to resolve the closely related problem of response clarity in questions extracted from political interviews, leveraging the capabilities of Large Language Models (LLMs) and human expertise. To this end, we introduce a novel taxonomy that frames the task of detecting and classifying response clarity and a corresponding clarity classification dataset which consists of question-answer (QA) pairs drawn from political interviews and annotated accordingly. Our proposed two-level taxonomy addresses the clarity of a response in terms of the information provided for a given question (high-level) and also provides a fine-grained taxonomy of evasion techniques that relate to unclear, ambiguous responses (lower-level). We combine ChatGPT and human annotators to collect, validate and annotate discrete QA pairs from political interviews, to be used for our newly introduced response clarity task. We provide a detailed analysis and conduct several experiments with different model architectures, sizes and adaptation methods to gain insights and establish new baselines over the proposed dataset and task.

"I Never Said That": A dataset, taxonomy and baselines on response clarity classification

TL;DR

Abstract

Paper Structure (56 sections, 2 equations, 24 figures, 23 tables)

This paper contains 56 sections, 2 equations, 24 figures, 23 tables.

Introduction
Related work
Equivocation in Social Sciences
Equivocation in NLP
Answerability in question answering
Discourse analysis of political speech
Proposed Taxonomy
Dataset creation
Human annotation process
Validation set & inter-annotator agreement
Handling disagreements
Exploratory data analysis
Experiments
Experimental setup
Modeling variants
...and 41 more sections

Figures (24)

Figure 1: An example from an interview from our dataset with classification along with an analysis from instruction-tuned Llama-70b.
Figure 2: Statistics on answer clarity in political interviews of the latest 4 US presidents.
Figure 3: Our proposed taxonomy of response clarity classification.
Figure 4: Annotators' agreement using Fleiss $\kappa$ for labels assigned to the 'evasion' classification level.
Figure 5: Label distribution in the dataset.
...and 19 more figures

"I Never Said That": A dataset, taxonomy and baselines on response clarity classification

TL;DR

Abstract

"I Never Said That": A dataset, taxonomy and baselines on response clarity classification

Authors

TL;DR

Abstract

Table of Contents

Figures (24)