Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data
Petar Ivanov, Ivan Koychev, Momchil Hardalov, Preslav Nakov
TL;DR
This work addresses automatic detection of check-worthy claims in political discourse and investigates the role of audio as an informative modality beyond text. It introduces a 48-hour multimodal dataset (text+audio) derived from political events, and a framework that aligns audio representations with a text-teacher and supports text+audio ensembles. Key findings show that audio improves performance in multi-speaker scenarios when fused with text, and that audio alone can outperform text for single-speaker cases; alignment significantly boosts audio representations, and ensemble models achieve the best overall MAP. The dataset and methods enable further multimodal research and have practical implications for moderators, journalists, and fact-checkers.
Abstract
Developing tools to automatically detect check-worthy claims in political debates and speeches can greatly help moderators of debates, journalists, and fact-checkers. While previous work on this problem has focused exclusively on the text modality, here we explore the utility of the audio modality as an additional input. We create a new multimodal dataset (text and audio in English) containing 48 hours of speech from past political debates in the USA. We then experimentally demonstrate that, in the case of multiple speakers, adding the audio modality yields sizable improvements over using the text modality alone; moreover, an audio-only model could outperform a text-only one for a single speaker. With the aim to enable future research, we make all our data and code publicly available at https://github.com/petar-iv/audio-checkworthiness-detection.
