Table of Contents
Fetching ...

Graph-Guided Textual Explanation Generation Framework

Shuzhou Yuan, Jingyi Sun, Ran Zhang, Michael Färber, Steffen Eger, Pepa Atanasova, Isabelle Augenstein

TL;DR

This work tackles the faithfulness gap in natural language explanations (NLEs) for model predictions. It introduces G-Tex, a Graph-Guided Textual Explanation Generation framework that extracts faithful highlight explanations, constructs a graph, and uses a Graph Neural Network to guide NLE generation within a self-rationalization model. The framework is evaluated on three reasoning datasets (e-SNLI, ComVE, ECQA) with T5 and BART bases, showing up to 12.18% improvements in NLE faithfulness and better alignment with human explanations, along with reduced redundancy in human evaluations. The results suggest that explicitly modeling the model's reasoning as a graph and guiding NLE generation with these cues yields more faithful, coherent, and trustworthy explanations, offering a foundation for broader criteria in NLE quality.

Abstract

Natural language explanations (NLEs) are commonly used to provide plausible free-text explanations of a model's reasoning about its predictions. However, recent work has questioned their faithfulness, as they may not accurately reflect the model's internal reasoning process regarding its predicted answer. In contrast, highlight explanations--input fragments critical for the model's predicted answers--exhibit measurable faithfulness. Building on this foundation, we propose G-Tex, a Graph-Guided Textual Explanation Generation framework designed to enhance the faithfulness of NLEs. Specifically, highlight explanations are first extracted as faithful cues reflecting the model's reasoning logic toward answer prediction. They are subsequently encoded through a graph neural network layer to guide the NLE generation, which aligns the generated explanations with the model's underlying reasoning toward the predicted answer. Experiments on T5 and BART using three reasoning datasets show that G-Tex improves NLE faithfulness by up to 12.18% compared to baseline methods. Additionally, G-Tex generates NLEs with greater semantic and lexical similarity to human-written ones. Human evaluations show that G-Tex can decrease redundant content and enhance the overall quality of NLEs. Our work presents a novel method for explicitly guiding NLE generation to enhance faithfulness, serving as a foundation for addressing broader criteria in NLE and generated text.

Graph-Guided Textual Explanation Generation Framework

TL;DR

This work tackles the faithfulness gap in natural language explanations (NLEs) for model predictions. It introduces G-Tex, a Graph-Guided Textual Explanation Generation framework that extracts faithful highlight explanations, constructs a graph, and uses a Graph Neural Network to guide NLE generation within a self-rationalization model. The framework is evaluated on three reasoning datasets (e-SNLI, ComVE, ECQA) with T5 and BART bases, showing up to 12.18% improvements in NLE faithfulness and better alignment with human explanations, along with reduced redundancy in human evaluations. The results suggest that explicitly modeling the model's reasoning as a graph and guiding NLE generation with these cues yields more faithful, coherent, and trustworthy explanations, offering a foundation for broader criteria in NLE quality.

Abstract

Natural language explanations (NLEs) are commonly used to provide plausible free-text explanations of a model's reasoning about its predictions. However, recent work has questioned their faithfulness, as they may not accurately reflect the model's internal reasoning process regarding its predicted answer. In contrast, highlight explanations--input fragments critical for the model's predicted answers--exhibit measurable faithfulness. Building on this foundation, we propose G-Tex, a Graph-Guided Textual Explanation Generation framework designed to enhance the faithfulness of NLEs. Specifically, highlight explanations are first extracted as faithful cues reflecting the model's reasoning logic toward answer prediction. They are subsequently encoded through a graph neural network layer to guide the NLE generation, which aligns the generated explanations with the model's underlying reasoning toward the predicted answer. Experiments on T5 and BART using three reasoning datasets show that G-Tex improves NLE faithfulness by up to 12.18% compared to baseline methods. Additionally, G-Tex generates NLEs with greater semantic and lexical similarity to human-written ones. Human evaluations show that G-Tex can decrease redundant content and enhance the overall quality of NLEs. Our work presents a novel method for explicitly guiding NLE generation to enhance faithfulness, serving as a foundation for addressing broader criteria in NLE and generated text.

Paper Structure

This paper contains 46 sections, 6 equations, 3 figures, 17 tables.

Figures (3)

  • Figure 1: Faithfulness comparison between a self-rationalization model without (top) and with (bottom) the proposed G-Tex. Highlight explanations reveal the model's reasoning behind the predicted label with high faithfulness. Without G-Tex, these important tokens are omitted in the NLE while G-Tex guides the model to incorporate them in the generated NLE.
  • Figure 2: Illustration of our framework G-Tex, which consists of four key steps: (1) We train a base model such as T5 using the task-specific dataset for label prediction (§\ref{['sec:post_hoc_explanation_and_predicted_label:method']}). (2) We extract three types of highlight explanations from the trained model (§\ref{['sec:post_hoc_explanation_and_predicted_label:method']}). (3) We construct the graph structure based on the highlight explanations (§\ref{['sec:post_hoc_explanation_as_graph:method']}) (4) We integrate the graph structure into the model with a GNN layer (§\ref{['sec:Graph_Neural_Network_Layer:method']}, §\ref{['sec:integrate_gnn:method']}) and fine-tune the overall model for label prediction and NLE generation (§\ref{['sec:self-Rationalization_label:method']}).
  • Figure 3: We generate three different types of post-hoc highlight explanations and use them to construct graph structures guiding the NLE generation within our framework. For simplicity, we present only a subset of the explanations for each type.