Table of Contents
Fetching ...

Reasoning-Grounded Natural Language Explanations for Language Models

Vojtech Cahlik, Rodrigo Alves, Pavel Kordik

TL;DR

This work addresses the faithfulness gap in natural language explanations for large language models by grounding both answers and explanations in a compact reasoning sequence, enabling the model to generate outputs from a reasoning trace that is later decoded into natural language. It introduces a joint predict-explain framework where the reasoning path informs both outputs while ensuring the two are inferred independently to prevent fabrication. Through experiments across logistic regression, decision trees, and NL-encoded trees, the authors show strong alignment between the final answers and explanations when reasoning is used, and they also observe improvements in answer quality. The approach offers a practical, resource-efficient pathway to more transparent LLM reasoning with potential for broader application in explainability and controllable AI tasks.

Abstract

We propose a large language model explainability technique for obtaining faithful natural language explanations by grounding the explanations in a reasoning process. When converted to a sequence of tokens, the outputs of the reasoning process can become part of the model context and later be decoded to natural language as the model produces either the final answer or the explanation. To improve the faithfulness of the explanations, we propose to use a joint predict-explain approach, in which the answers and explanations are inferred directly from the reasoning sequence, without the explanations being dependent on the answers and vice versa. We demonstrate the plausibility of the proposed technique by achieving a high alignment between answers and explanations in several problem domains, observing that language models often simply copy the partial decisions from the reasoning sequence into the final answers or explanations. Furthermore, we show that the proposed use of reasoning can also improve the quality of the answers.

Reasoning-Grounded Natural Language Explanations for Language Models

TL;DR

This work addresses the faithfulness gap in natural language explanations for large language models by grounding both answers and explanations in a compact reasoning sequence, enabling the model to generate outputs from a reasoning trace that is later decoded into natural language. It introduces a joint predict-explain framework where the reasoning path informs both outputs while ensuring the two are inferred independently to prevent fabrication. Through experiments across logistic regression, decision trees, and NL-encoded trees, the authors show strong alignment between the final answers and explanations when reasoning is used, and they also observe improvements in answer quality. The approach offers a practical, resource-efficient pathway to more transparent LLM reasoning with potential for broader application in explainability and controllable AI tasks.

Abstract

We propose a large language model explainability technique for obtaining faithful natural language explanations by grounding the explanations in a reasoning process. When converted to a sequence of tokens, the outputs of the reasoning process can become part of the model context and later be decoded to natural language as the model produces either the final answer or the explanation. To improve the faithfulness of the explanations, we propose to use a joint predict-explain approach, in which the answers and explanations are inferred directly from the reasoning sequence, without the explanations being dependent on the answers and vice versa. We demonstrate the plausibility of the proposed technique by achieving a high alignment between answers and explanations in several problem domains, observing that language models often simply copy the partial decisions from the reasoning sequence into the final answers or explanations. Furthermore, we show that the proposed use of reasoning can also improve the quality of the answers.

Paper Structure

This paper contains 30 sections, 6 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of our methodology. As a first step, we gather a conversational dataset in which for each user input, the triplet of reasoning-answer-explanation ground truths is present. In the second step, we fine-tune a conversational GPT model on the dataset from step 1. As a last step, we perform inference using the fine-tuned model by first computing a reasoning sequence and then including it in the conversation to produce the final answer or explanation, which are obtained independently of each other.
  • Figure 2: Experiments with joint training of answers and explanations on a decision tree dataset. The colored regions correspond to ground-truth classes. When reasoning is used, answer and explanation classification errors are typically near-perfectly aligned.
  • Figure 3: Classification accuracies for experiments with decision trees of various depths