Graph-Guided Textual Explanation Generation Framework
Shuzhou Yuan, Jingyi Sun, Ran Zhang, Michael Färber, Steffen Eger, Pepa Atanasova, Isabelle Augenstein
TL;DR
This work tackles the faithfulness gap in natural language explanations (NLEs) for model predictions. It introduces G-Tex, a Graph-Guided Textual Explanation Generation framework that extracts faithful highlight explanations, constructs a graph, and uses a Graph Neural Network to guide NLE generation within a self-rationalization model. The framework is evaluated on three reasoning datasets (e-SNLI, ComVE, ECQA) with T5 and BART bases, showing up to 12.18% improvements in NLE faithfulness and better alignment with human explanations, along with reduced redundancy in human evaluations. The results suggest that explicitly modeling the model's reasoning as a graph and guiding NLE generation with these cues yields more faithful, coherent, and trustworthy explanations, offering a foundation for broader criteria in NLE quality.
Abstract
Natural language explanations (NLEs) are commonly used to provide plausible free-text explanations of a model's reasoning about its predictions. However, recent work has questioned their faithfulness, as they may not accurately reflect the model's internal reasoning process regarding its predicted answer. In contrast, highlight explanations--input fragments critical for the model's predicted answers--exhibit measurable faithfulness. Building on this foundation, we propose G-Tex, a Graph-Guided Textual Explanation Generation framework designed to enhance the faithfulness of NLEs. Specifically, highlight explanations are first extracted as faithful cues reflecting the model's reasoning logic toward answer prediction. They are subsequently encoded through a graph neural network layer to guide the NLE generation, which aligns the generated explanations with the model's underlying reasoning toward the predicted answer. Experiments on T5 and BART using three reasoning datasets show that G-Tex improves NLE faithfulness by up to 12.18% compared to baseline methods. Additionally, G-Tex generates NLEs with greater semantic and lexical similarity to human-written ones. Human evaluations show that G-Tex can decrease redundant content and enhance the overall quality of NLEs. Our work presents a novel method for explicitly guiding NLE generation to enhance faithfulness, serving as a foundation for addressing broader criteria in NLE and generated text.
