Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

Yanda Chen; Chandan Singh; Xiaodong Liu; Simiao Zuo; Bin Yu; He He; Jianfeng Gao

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

Yanda Chen, Chandan Singh, Xiaodong Liu, Simiao Zuo, Bin Yu, He He, Jianfeng Gao

TL;DR

This paper addresses the problem of inconsistent natural-language explanations produced by large language models. It introduces explanation-consistency finetuning (EC-finetuning), a two-step synthetic data augmentation method that encourages consistency across related questions by having follow-up questions and answers aligned with an initial explanation. Across multiple QA datasets, EC-finetuning yields notable improvements in explanation consistency (10.0% relative on finetuned data, 4.5% on unseen data) with modest accuracy gains, and generalizes to out-of-distribution domains. Analyses show that consistency correlates with correctness and that improvements are more pronounced on correct predictions, highlighting EC-finetuning’s potential for helping users form accurate mental models of model behavior.

Abstract

Large language models (LLMs) often generate convincing, fluent explanations. However, different from humans, they often generate inconsistent explanations on different inputs. For example, an LLM may generate the explanation "all birds can fly" when answering the question "Can sparrows fly?" but meanwhile answer "no" to the related question "Can penguins fly?". Explanations should be consistent across related examples so that they allow a human to simulate the LLM's decision process on multiple examples. We propose explanation-consistency finetuning (EC-finetuning), a method that adapts LLMs to generate more consistent natural-language explanations on related examples. EC-finetuning involves finetuning LLMs on synthetic data that is carefully constructed to contain consistent explanations. Across a variety of question-answering datasets in various domains, EC-finetuning yields a 10.0% relative explanation consistency improvement on four finetuning datasets, and generalizes to seven out-of-distribution datasets not seen during finetuning (+4.5% relative). Code is available at https://github.com/yandachen/explanation-consistency-finetuning .

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

TL;DR

Abstract

Paper Structure (21 sections, 2 figures, 8 tables)

This paper contains 21 sections, 2 figures, 8 tables.

Introduction
Related work
Generating and improving natural-language explanations
Evaluating natural-language explanations
Method: EC-finetuning
Explanation-consistency Finetuning
Measuring consistency
Results
Experimental setup
Main result: EC-finetuning improves explanation consistency
EC-finetuning using only a single LLM
Analysis
EC-finetuning improves explanation consistency in different ways.
Inconsistent explanations suggest incorrect predictions.
EC-finetuning improves consistency more on correct predictions.
...and 6 more sections

Figures (2)

Figure 1: EC-finetuning adapts an LLM to provide explanations that are more consistent with a user's expectation of LLM answers across related questions.
Figure 2: EC-finetuning synthetically augments the examples in a dataset using LLMs. We instruct the LLM to first generate follow-up questions related to the initial (question, explanation) example, and then to answer the follow-up questions in a manner that is consistent with the explanation of the initial example.

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

TL;DR

Abstract

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)