Inference to the Best Explanation in Large Language Models

Dhairya Dalal; Marco Valentino; André Freitas; Paul Buitelaar

Inference to the Best Explanation in Large Language Models

Dhairya Dalal, Marco Valentino, André Freitas, Paul Buitelaar

TL;DR

The paper addresses how to interpret and automatically evaluate LLM-generated explanations by framing explanations as candidates to be judged under Inference to the Best Explanation (IBE). It introduces IBE-Eval, which computes four criteria—consistency, parsimony, coherence, and uncertainty—and combines them with a linear model to select the most plausible explanation in causal question answering tasks. Across COPA and E-CARE, IBE-Eval achieves up to 77% accuracy, outperforms a GPT-3.5 judge baseline, and shows strong alignment with human judgment, particularly through linguistic uncertainty as a predictor. The study highlights the potential of automated, interpretable explanation verification while acknowledging limitations like factual grounding and domain scope, offering a basis for future development of explanatory evaluation tools.

Abstract

While Large Language Models (LLMs) have found success in real-world applications, their underlying explanatory process is still poorly understood. This paper proposes IBE-Eval, a framework inspired by philosophical accounts on Inference to the Best Explanation (IBE) to advance the interpretation and evaluation of LLMs' explanations. IBE-Eval estimates the plausibility of natural language explanations through a combination of explicit logical and linguistic features including: consistency, parsimony, coherence, and uncertainty. Extensive experiments are conducted on Causal Question Answering (CQA), where \textit{IBE-Eval} is tasked to select the most plausible causal explanation amongst competing ones generated by LLMs (i.e., GPT 3.5 and Llama 2). The experiments reveal that IBE-Eval can successfully identify the best explanation with up to 77\% accuracy ($\approx 27\%$ above random), improving upon a GPT 3.5-as-a-Judge baseline ($\approx+17\%$) while being intrinsically more efficient and interpretable. Additional analyses suggest that, despite model-specific variances, LLM-generated explanations tend to conform to IBE criteria and that IBE-Eval is significantly correlated with human judgment, opening up opportunities for future development of automated explanation verification tools.

Inference to the Best Explanation in Large Language Models

TL;DR

Abstract

above random), improving upon a GPT 3.5-as-a-Judge baseline (

) while being intrinsically more efficient and interpretable. Additional analyses suggest that, despite model-specific variances, LLM-generated explanations tend to conform to IBE criteria and that IBE-Eval is significantly correlated with human judgment, opening up opportunities for future development of automated explanation verification tools.

Paper Structure (45 sections, 6 equations, 19 figures, 1 table, 4 algorithms)

This paper contains 45 sections, 6 equations, 19 figures, 1 table, 4 algorithms.

Introduction
Inference to the Best Explanation (IBE)
Explanation Generation
Linguistic & Inference Criteria
Consistency.
Parsimony.
Coherence.
Uncertainty.
Inference to Best Explanation
Experimental Setting
COPA.
E-CARE.
LLMs.
Baselines.
Preliminary Analysis
...and 30 more sections

Figures (19)

Figure 1: IBE-Eval qualifies LLM-generated explanations with a set of logical and linguistic selection criteria to identify the most plausible hypothesis. The corresponding explanation for each hypothesis is evaluated across the IBE criteria of logical consistency, parsimony, internal coherence, and linguistic uncertainty. A final plausibility score is computed across those features and the hypothesis with highest score is identified as the best explanation.
Figure 2: A regression analysis measuring the correlation between IBE criteria and question accuracy. All the LLMs tend to conform to IBE expectations with GPT 3.5 exhibiting the most consistent and significant alignment. Linguistic uncertainty is the strongest IBE predictor for explanation quality, where higher uncertainty is negatively correlated with question accuracy. Statistical significance is noted as: ‘***’ p < 0.001, ‘**’ p < 0.01 ‘*’ p < 0.05.
Figure 3: An evaluation of explanation consistency. LLMs are strong rationalizers and can generate logically consistent explanations at equal rates for explanations associated with both correct and incorrect answers options.
Figure 4: Explanation parsimony is evaluated using proof depth and concept drift. Both metrics are consistently lower for explanations supporting the correct answers suggesting that LLMs are able to generate efficient explanations for the more plausible hypothesis.
Figure 5: An evaluation of the explanation coherence and question accuracy.The average coherence score is consistently higher for explanations corresponding to the correct hypotheses across the LLMs.
...and 14 more figures

Inference to the Best Explanation in Large Language Models

TL;DR

Abstract

Inference to the Best Explanation in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (19)