Table of Contents
Fetching ...

Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation

Zhouhao Sun, Xiao Ding, Li Du, Bibo Cai, Jinglong Gao, Ting Liu, Qin Bing

TL;DR

The paper addresses the challenge that large language models struggle with first-order logic reasoning over natural-language theories due to theoretical incompleteness and hallucination. It introduces GFaiR, a framework that leverages resolution refutation with five components (Converter, Pre-Selector, Post-Selector, Knowledge Composer, Verifier) to perform complete and faithful NL-level reasoning, including a validity-contrastive loss to maintain logical coherence. GFaiR achieves state-of-the-art results on hard RuleTaker datasets, demonstrates strong generalization to deeper reasoning depths and to datasets with existence quantifiers, and provides evidence of faithful reasoning through its structured proof-like intermediate steps. This approach advances reliable, logic-grounded reasoning in NL settings, with potential impacts on AI systems requiring robust deductive capabilities.

Abstract

Large language models (LLMs) have achieved significant performance in various natural language reasoning tasks. However, they still struggle with performing first-order logic reasoning over formal logical theories expressed in natural language. This is because the previous LLMs-based reasoning systems have the theoretical incompleteness issue. As a result, it can only address a limited set of simple reasoning problems, which significantly decreases their generalization ability. To address this issue, we propose a novel framework, named Generalizable and Faithful Reasoner (GFaiR), which introduces the paradigm of resolution refutation. Resolution refutation has the capability to solve all first-order logic reasoning problems by extending reasoning rules and employing the principle of proof by contradiction, so our system's completeness can be improved by introducing resolution refutation. Experimental results demonstrate that our system outperforms previous works by achieving state-of-the-art performances in complex scenarios while maintaining performances in simple scenarios. Besides, we observe that GFaiR is faithful to its reasoning process.

Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation

TL;DR

The paper addresses the challenge that large language models struggle with first-order logic reasoning over natural-language theories due to theoretical incompleteness and hallucination. It introduces GFaiR, a framework that leverages resolution refutation with five components (Converter, Pre-Selector, Post-Selector, Knowledge Composer, Verifier) to perform complete and faithful NL-level reasoning, including a validity-contrastive loss to maintain logical coherence. GFaiR achieves state-of-the-art results on hard RuleTaker datasets, demonstrates strong generalization to deeper reasoning depths and to datasets with existence quantifiers, and provides evidence of faithful reasoning through its structured proof-like intermediate steps. This approach advances reliable, logic-grounded reasoning in NL settings, with potential impacts on AI systems requiring robust deductive capabilities.

Abstract

Large language models (LLMs) have achieved significant performance in various natural language reasoning tasks. However, they still struggle with performing first-order logic reasoning over formal logical theories expressed in natural language. This is because the previous LLMs-based reasoning systems have the theoretical incompleteness issue. As a result, it can only address a limited set of simple reasoning problems, which significantly decreases their generalization ability. To address this issue, we propose a novel framework, named Generalizable and Faithful Reasoner (GFaiR), which introduces the paradigm of resolution refutation. Resolution refutation has the capability to solve all first-order logic reasoning problems by extending reasoning rules and employing the principle of proof by contradiction, so our system's completeness can be improved by introducing resolution refutation. Experimental results demonstrate that our system outperforms previous works by achieving state-of-the-art performances in complex scenarios while maintaining performances in simple scenarios. Besides, we observe that GFaiR is faithful to its reasoning process.
Paper Structure (24 sections, 2 equations, 2 figures, 6 tables)

This paper contains 24 sections, 2 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: (a) Example of an NL Theory and a hypothesis with gold answers. Note that the meaning of these statements are not related to the common sense. (b) For hypothesis 1, the reasoning process using the method of resolution refutation is shown. The process of refutation is reflected from 'hypothesis' to "Bob is kind" and the grey box represents the process of resolution at the natural language level.
  • Figure 2: Architecture of GFaiR. We mark the converter in orange, the selector (consisting of pre-selector and post-selector) in green, the verifier with its validity contrastive loss in yellow, and the knowledge composer in blue, respectively.