Incongruence Identification in Eyewitness Testimony
Akshara Nair, Zeba Afroz, Md Shad Akhtar
TL;DR
This work introduces a novel task: incongruence detection in eyewitness testimonies, focusing on span-level contradictions across multiple witnesses. It presents MIND, a large-scale dataset of context–testimony pairs with annotated incongruent spans, and INTEND, an instruction-tuned framework that combines 6Ws prompts with multi-hop reasoning to detect incongruences and extract spans. Empirical results show INTEND outperforms baselines across MLM and LLM paradigms, achieving a peak F1 of $0.75$ on span identification and a notable improvement of $+5.63$ percentage points over strong baselines, with human evaluators validating span quality. The approach enhances explainability in testimony analysis and offers a principled pathway to robust, context-aware deception detection in investigative settings.
Abstract
Incongruence detection in eyewitness narratives is critical for understanding the reliability of testimonies, yet traditional approaches often fail to address the nuanced inconsistencies inherent in such accounts. In this paper, we introduce a novel task of incongruence detection in eyewitness testimonies. Given a pair of testimonies containing of multiple pairs of question and answer by two subjects, we identify contextually related incongruence between the two subjects. We also mark the span of incongruences in the utterances. To achieve this, we developed MIND(MultI-EyewitNess Deception) - a comprehensive dataset consisting of 2927 pairs of contextually related answers designed to capture both explicit and implicit contradictions. INstruction - TunEd iNcongruity Detection framework based on 6W and multi-hop reasoning approach, aka. INTEND. Drawing from investigative techniques, INTEND address the task as a close-style problem, contradicting on the who, what, when, where and why aspect of the content. Our findings shows that prompt tuning, especially when utilizing our framework, enhances the detection of incongruences by a margin of +5.63 percent. We compare our approach with multiple fine-tuning and prompt tuning techniques on MLMs and LLMs. Emperical results demonstrate convincing performance improvement in F1-score over fine-tuned and regular prompt-tuning techniques, highlighting the effectiveness of our approach.
