Table of Contents
Fetching ...

Explaining Answers with Entailment Trees

Bhavana Dalvi, Peter Jansen, Oyvind Tafjord, Zhengnan Xie, Hannah Smith, Leighanna Pipatanangkura, Peter Clark

TL;DR

The work tackles the challenge of explaining QA answers by rendering the underlying reasoning as multistep entailment trees rather than single rationales. It formalizes the task and creates EntailmentBank, the first large dataset of multistep entailment trees, enabling training and evaluation of explanation generators. The authors introduce EntailmentWriters, demonstrate baseline performance on progressively harder tasks, and discuss generalization across domains along with interactive and debugging implications. This advancing of structured explanations aims to improve transparency, debugging, and user interaction in QA systems. EntailmentBank thus provides a foundation for richer, more verifiable explanations beyond traditional evidence snippets.

Abstract

Our goal, in the context of open-domain textual question-answering (QA), is to explain answers by showing the line of reasoning from what is known to the answer, rather than simply showing a fragment of textual evidence (a "rationale'"). If this could be done, new opportunities for understanding and debugging the system's reasoning become possible. Our approach is to generate explanations in the form of entailment trees, namely a tree of multipremise entailment steps from facts that are known, through intermediate conclusions, to the hypothesis of interest (namely the question + answer). To train a model with this skill, we created ENTAILMENTBANK, the first dataset to contain multistep entailment trees. Given a hypothesis (question + answer), we define three increasingly difficult explanation tasks: generate a valid entailment tree given (a) all relevant sentences (b) all relevant and some irrelevant sentences, or (c) a corpus. We show that a strong language model can partially solve these tasks, in particular when the relevant sentences are included in the input (e.g., 35% of trees for (a) are perfect), and with indications of generalization to other domains. This work is significant as it provides a new type of dataset (multistep entailments) and baselines, offering a new avenue for the community to generate richer, more systematic explanations.

Explaining Answers with Entailment Trees

TL;DR

The work tackles the challenge of explaining QA answers by rendering the underlying reasoning as multistep entailment trees rather than single rationales. It formalizes the task and creates EntailmentBank, the first large dataset of multistep entailment trees, enabling training and evaluation of explanation generators. The authors introduce EntailmentWriters, demonstrate baseline performance on progressively harder tasks, and discuss generalization across domains along with interactive and debugging implications. This advancing of structured explanations aims to improve transparency, debugging, and user interaction in QA systems. EntailmentBank thus provides a foundation for richer, more verifiable explanations beyond traditional evidence snippets.

Abstract

Our goal, in the context of open-domain textual question-answering (QA), is to explain answers by showing the line of reasoning from what is known to the answer, rather than simply showing a fragment of textual evidence (a "rationale'"). If this could be done, new opportunities for understanding and debugging the system's reasoning become possible. Our approach is to generate explanations in the form of entailment trees, namely a tree of multipremise entailment steps from facts that are known, through intermediate conclusions, to the hypothesis of interest (namely the question + answer). To train a model with this skill, we created ENTAILMENTBANK, the first dataset to contain multistep entailment trees. Given a hypothesis (question + answer), we define three increasingly difficult explanation tasks: generate a valid entailment tree given (a) all relevant sentences (b) all relevant and some irrelevant sentences, or (c) a corpus. We show that a strong language model can partially solve these tasks, in particular when the relevant sentences are included in the input (e.g., 35% of trees for (a) are perfect), and with indications of generalization to other domains. This work is significant as it provides a new type of dataset (multistep entailments) and baselines, offering a new avenue for the community to generate richer, more systematic explanations.

Paper Structure

This paper contains 1 section, 1 figure, 1 table.

Table of Contents

  1. Introduction

Figures (1)

  • Figure 1: Given a hypothesis (green, summarizing a question+answer pair), and some partially relevant text (or a corpus), our goal is to generate an entailment tree, including intermediate nodes (blue), showing how the hypothesis follows from the text/corpus.