Table of Contents
Fetching ...

Atomic Inference for NLI with Generated Facts as Atoms

Joe Stacey, Pasquale Minervini, Haim Dubossarsky, Oana-Maria Camburu, Marek Rei

TL;DR

This work investigates the effectiveness of using LLM-generated facts as atoms, decomposing Natural Language Inference premises into lists of facts, and finds that the fact-based method outperforms other approaches.

Abstract

With recent advances, neural models can achieve human-level performance on various natural language tasks. However, there are no guarantees that any explanations from these models are faithful, i.e. that they reflect the inner workings of the model. Atomic inference overcomes this issue, providing interpretable and faithful model decisions. This approach involves making predictions for different components (or atoms) of an instance, before using interpretable and deterministic rules to derive the overall prediction based on the individual atom-level predictions. We investigate the effectiveness of using LLM-generated facts as atoms, decomposing Natural Language Inference premises into lists of facts. While directly using generated facts in atomic inference systems can result in worse performance, with 1) a multi-stage fact generation process, and 2) a training regime that incorporates the facts, our fact-based method outperforms other approaches.

Atomic Inference for NLI with Generated Facts as Atoms

TL;DR

This work investigates the effectiveness of using LLM-generated facts as atoms, decomposing Natural Language Inference premises into lists of facts, and finds that the fact-based method outperforms other approaches.

Abstract

With recent advances, neural models can achieve human-level performance on various natural language tasks. However, there are no guarantees that any explanations from these models are faithful, i.e. that they reflect the inner workings of the model. Atomic inference overcomes this issue, providing interpretable and faithful model decisions. This approach involves making predictions for different components (or atoms) of an instance, before using interpretable and deterministic rules to derive the overall prediction based on the individual atom-level predictions. We investigate the effectiveness of using LLM-generated facts as atoms, decomposing Natural Language Inference premises into lists of facts. While directly using generated facts in atomic inference systems can result in worse performance, with 1) a multi-stage fact generation process, and 2) a training regime that incorporates the facts, our fact-based method outperforms other approaches.
Paper Structure (24 sections, 5 equations, 6 figures, 11 tables)

This paper contains 24 sections, 5 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Generated fact-lists for a test example, including: 1) an initially generated fact-list, 2) a second generated fact-list that can be concatenated with the first list, 3) facts that an LLM identifies are missing from the original fact list, and 4) a generated fact that is conditioned on the hypothesis.
  • Figure 2: Inference and training framework from our work decomposing the NLI premise, compared to the rules used by Joe_Logic when decomposing the NLI hypothesis.
  • Figure 3: We show the premise, hypothesis and the generated fact list for a hypothesis-premise pair in the dev set that both FactAI and FGLR correctly predict. However, despite correct instance-level predictions, we see FactAI predicting contradiction for fact #6, even when this is not appropriate. In this case, the acquisition in 2015 by Postmedia does not contradict there also being an acquisition by Toronto Sun Publishing in 1988. The hypothesis conditioned fact generated for FGLR is almost identical to fact #5, and FGLR predicts both facts as contradiction.
  • Figure 4: Our FGLR method is summarised above, with five stages: 1) generating fact lists for each premise, 2) generating an additional fact when performing inference, prompting GPT-3 to create a relevant fact from the premise for a specific hypothesis, 3) creating a representation for each fact, 4) our fact-level and instance-level losses used in training, and 5) evaluation using our evaluation rules.
  • Figure 5: The premise, hypothesis and the generated fact list for a hypothesis-premise pair in the dev set. The 6th fact is the hypothesis-conditioned fact (the content of this fact overlaps with the 5th fact provided). The example provided is from the DeBERTa-base FGLR model. FGLR correctly predicts facts 5 and 6 as contradiction (with all other facts being neutral).
  • ...and 1 more figures