Table of Contents
Fetching ...

ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning

Swarnadeep Saha, Prateek Yadav, Lisa Bauer, Mohit Bansal

TL;DR

ExplaGraphs introduces a generative, structured explanation-graph task for stance prediction to address the limitations of discriminative commonsense reasoning benchmarks. It provides a two-stage Create-Verify-And-Refine data collection framework, a multi-level evaluation pipeline, and a commonsense-augmented structured model with ILP constraints. Results show models struggle to generate high-quality explanation graphs, with a large gap to human performance, underscoring the need for advances in graph-based commonsense reasoning. The work contributes a publicly available dataset and evaluation tools that empower future research toward explainable, structured CSR systems.

Abstract

Recent commonsense-reasoning tasks are typically discriminative in nature, where a model answers a multiple-choice question for a certain context. Discriminative tasks are limiting because they fail to adequately evaluate the model's ability to reason and explain predictions with underlying commonsense knowledge. They also allow such models to use reasoning shortcuts and not be "right for the right reasons". In this work, we present ExplaGraphs, a new generative and structured commonsense-reasoning task (and an associated dataset) of explanation graph generation for stance prediction. Specifically, given a belief and an argument, a model has to predict if the argument supports or counters the belief and also generate a commonsense-augmented graph that serves as non-trivial, complete, and unambiguous explanation for the predicted stance. We collect explanation graphs through a novel Create-Verify-And-Refine graph collection framework that improves the graph quality (up to 90%) via multiple rounds of verification and refinement. A significant 79% of our graphs contain external commonsense nodes with diverse structures and reasoning depths. Next, we propose a multi-level evaluation framework, consisting of automatic metrics and human evaluation, that check for the structural and semantic correctness of the generated graphs and their degree of match with ground-truth graphs. Finally, we present several structured, commonsense-augmented, and text generation models as strong starting points for this explanation graph generation task, and observe that there is a large gap with human performance, thereby encouraging future work for this new challenging task. ExplaGraphs will be publicly available at https://explagraphs.github.io.

ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning

TL;DR

ExplaGraphs introduces a generative, structured explanation-graph task for stance prediction to address the limitations of discriminative commonsense reasoning benchmarks. It provides a two-stage Create-Verify-And-Refine data collection framework, a multi-level evaluation pipeline, and a commonsense-augmented structured model with ILP constraints. Results show models struggle to generate high-quality explanation graphs, with a large gap to human performance, underscoring the need for advances in graph-based commonsense reasoning. The work contributes a publicly available dataset and evaluation tools that empower future research toward explainable, structured CSR systems.

Abstract

Recent commonsense-reasoning tasks are typically discriminative in nature, where a model answers a multiple-choice question for a certain context. Discriminative tasks are limiting because they fail to adequately evaluate the model's ability to reason and explain predictions with underlying commonsense knowledge. They also allow such models to use reasoning shortcuts and not be "right for the right reasons". In this work, we present ExplaGraphs, a new generative and structured commonsense-reasoning task (and an associated dataset) of explanation graph generation for stance prediction. Specifically, given a belief and an argument, a model has to predict if the argument supports or counters the belief and also generate a commonsense-augmented graph that serves as non-trivial, complete, and unambiguous explanation for the predicted stance. We collect explanation graphs through a novel Create-Verify-And-Refine graph collection framework that improves the graph quality (up to 90%) via multiple rounds of verification and refinement. A significant 79% of our graphs contain external commonsense nodes with diverse structures and reasoning depths. Next, we propose a multi-level evaluation framework, consisting of automatic metrics and human evaluation, that check for the structural and semantic correctness of the generated graphs and their degree of match with ground-truth graphs. Finally, we present several structured, commonsense-augmented, and text generation models as strong starting points for this explanation graph generation task, and observe that there is a large gap with human performance, thereby encouraging future work for this new challenging task. ExplaGraphs will be publicly available at https://explagraphs.github.io.

Paper Structure

This paper contains 52 sections, 3 equations, 27 figures, 9 tables.

Figures (27)

  • Figure 1: Two representative examples from our dataset. Explanation graphs are read and reasoned through by following the edges that explain why the argument supports or counters the belief.
  • Figure 2: Interface for our data collection framework consisting of two stages. In Stage 1, we collect (belief, argument, stance) triples in pre-HAMLET and multiple HAMLET (human-and-model-in-the-loop) rounds. In each HAMLET round, we collect harder examples by asking the annotators to fool a stance prediction model. In Stage 2, we collect the corresponding explanation graphs through a Create-Verify-And-Refine framework.
  • Figure 3: Explanation Graph Creation Interface.
  • Figure 4: Our multi-level evaluation framework.
  • Figure 5: Our Commonsense-Augmented Structured Prediction Model for explanation graph generation.
  • ...and 22 more figures