Inference to the Best Explanation in Large Language Models
Dhairya Dalal, Marco Valentino, André Freitas, Paul Buitelaar
TL;DR
The paper addresses how to interpret and automatically evaluate LLM-generated explanations by framing explanations as candidates to be judged under Inference to the Best Explanation (IBE). It introduces IBE-Eval, which computes four criteria—consistency, parsimony, coherence, and uncertainty—and combines them with a linear model to select the most plausible explanation in causal question answering tasks. Across COPA and E-CARE, IBE-Eval achieves up to 77% accuracy, outperforms a GPT-3.5 judge baseline, and shows strong alignment with human judgment, particularly through linguistic uncertainty as a predictor. The study highlights the potential of automated, interpretable explanation verification while acknowledging limitations like factual grounding and domain scope, offering a basis for future development of explanatory evaluation tools.
Abstract
While Large Language Models (LLMs) have found success in real-world applications, their underlying explanatory process is still poorly understood. This paper proposes IBE-Eval, a framework inspired by philosophical accounts on Inference to the Best Explanation (IBE) to advance the interpretation and evaluation of LLMs' explanations. IBE-Eval estimates the plausibility of natural language explanations through a combination of explicit logical and linguistic features including: consistency, parsimony, coherence, and uncertainty. Extensive experiments are conducted on Causal Question Answering (CQA), where \textit{IBE-Eval} is tasked to select the most plausible causal explanation amongst competing ones generated by LLMs (i.e., GPT 3.5 and Llama 2). The experiments reveal that IBE-Eval can successfully identify the best explanation with up to 77\% accuracy ($\approx 27\%$ above random), improving upon a GPT 3.5-as-a-Judge baseline ($\approx+17\%$) while being intrinsically more efficient and interpretable. Additional analyses suggest that, despite model-specific variances, LLM-generated explanations tend to conform to IBE criteria and that IBE-Eval is significantly correlated with human judgment, opening up opportunities for future development of automated explanation verification tools.
