Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation
Zhouhao Sun, Xiao Ding, Li Du, Bibo Cai, Jinglong Gao, Ting Liu, Qin Bing
TL;DR
The paper addresses the challenge that large language models struggle with first-order logic reasoning over natural-language theories due to theoretical incompleteness and hallucination. It introduces GFaiR, a framework that leverages resolution refutation with five components (Converter, Pre-Selector, Post-Selector, Knowledge Composer, Verifier) to perform complete and faithful NL-level reasoning, including a validity-contrastive loss to maintain logical coherence. GFaiR achieves state-of-the-art results on hard RuleTaker datasets, demonstrates strong generalization to deeper reasoning depths and to datasets with existence quantifiers, and provides evidence of faithful reasoning through its structured proof-like intermediate steps. This approach advances reliable, logic-grounded reasoning in NL settings, with potential impacts on AI systems requiring robust deductive capabilities.
Abstract
Large language models (LLMs) have achieved significant performance in various natural language reasoning tasks. However, they still struggle with performing first-order logic reasoning over formal logical theories expressed in natural language. This is because the previous LLMs-based reasoning systems have the theoretical incompleteness issue. As a result, it can only address a limited set of simple reasoning problems, which significantly decreases their generalization ability. To address this issue, we propose a novel framework, named Generalizable and Faithful Reasoner (GFaiR), which introduces the paradigm of resolution refutation. Resolution refutation has the capability to solve all first-order logic reasoning problems by extending reasoning rules and employing the principle of proof by contradiction, so our system's completeness can be improved by introducing resolution refutation. Experimental results demonstrate that our system outperforms previous works by achieving state-of-the-art performances in complex scenarios while maintaining performances in simple scenarios. Besides, we observe that GFaiR is faithful to its reasoning process.
