Cross-Examiner: Evaluating Consistency of Large Language Model-Generated Explanations
Danielle Villa, Maria Chang, Keerthiram Murugesan, Rosario Uceda-Sosa, Karthikeyan Natesan Ramamurthy
TL;DR
The paper tackles the problem of unfaithful or inconsistent explanations produced by large language models. It introduces Cross-Examiner, a neuro-symbolic pipeline that extracts structured triples from questions and explanations, discovers patterns to generate yes/no follow-up questions, and uses a consistency checker to detect contradictions. Through extensive human evaluation on the BBQ fairness dataset, the authors demonstrate that combining symbolic information extraction with targeted question writing yields higher-quality follow-ups and more reliable inconsistency detection than purely LLM-based approaches, with ablations highlighting the value of the symbolic steps. The work advances trustworthy AI explanations by providing a flexible, model-agnostic framework for systematic explanation validation and lays out clear avenues for scaling and refinement.
Abstract
Large Language Models (LLMs) are often asked to explain their outputs to enhance accuracy and transparency. However, evidence suggests that these explanations can misrepresent the models' true reasoning processes. One effective way to identify inaccuracies or omissions in these explanations is through consistency checking, which typically involves asking follow-up questions. This paper introduces, cross-examiner, a new method for generating follow-up questions based on a model's explanation of an initial question. Our method combines symbolic information extraction with language model-driven question generation, resulting in better follow-up questions than those produced by LLMs alone. Additionally, this approach is more flexible than other methods and can generate a wider variety of follow-up questions.
