Table of Contents
Fetching ...

Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework

Jundong Xu, Hao Fei, Meng Luo, Qian Liu, Liangming Pan, William Yang Wang, Preslav Nakov, Mong-Li Lee, Wynne Hsu

TL;DR

This work defines a logic-complete reasoning framework, Aristotle, that tightly couples symbolic logic with LLM-driven reasoning across decomposition, search, and resolution to address both efficacy and efficiency gaps in logical tasks. The architecture comprises a Translator, Decomposer, Search Router, and Resolver, enabling a full decompose-search-resolve cycle that operates on CNF-normalized premises via a proof-by-contradiction scheme. Empirical results across ProntoQA, ProofWriter, and LogicNLI show that Aristotle consistently outperforms state-of-the-art baselines, with especially large gains on complex logical tasks and demonstrated generalizability across models such as GPT-4, GPT-4o, Claude, and LLaMA. The findings highlight near-perfect one-step reasoning accuracy in the Resolver, reduced search errors, and stable cost scaling with reasoning depth, underscoring the practical impact of integrating symbolic logic into end-to-end LLM reasoning. The work also provides open-source code to encourage adoption and further research in reliable, scalable logical reasoning with LLMs.

Abstract

In the context of large language models (LLMs), current advanced reasoning methods have made impressive strides in various reasoning tasks. However, when it comes to logical reasoning tasks, major challenges remain in both efficacy and efficiency. This is rooted in the fact that these systems fail to fully leverage the inherent structure of logical tasks throughout the reasoning processes such as decomposition, search, and resolution. To address this, we propose a logic-complete reasoning framework, Aristotle, with three key components: Logical Decomposer, Logical Search Router, and Logical Resolver. In our framework, symbolic expressions and logical rules are comprehensively integrated into the entire reasoning process, significantly alleviating the bottlenecks of logical reasoning, i.e., reducing sub-task complexity, minimizing search errors, and resolving logical contradictions. The experimental results on several datasets demonstrate that Aristotle consistently outperforms state-of-the-art reasoning frameworks in both accuracy and efficiency, particularly excelling in complex logical reasoning scenarios. We will open-source all our code at https://llm-symbol.github.io/Aristotle/.

Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework

TL;DR

This work defines a logic-complete reasoning framework, Aristotle, that tightly couples symbolic logic with LLM-driven reasoning across decomposition, search, and resolution to address both efficacy and efficiency gaps in logical tasks. The architecture comprises a Translator, Decomposer, Search Router, and Resolver, enabling a full decompose-search-resolve cycle that operates on CNF-normalized premises via a proof-by-contradiction scheme. Empirical results across ProntoQA, ProofWriter, and LogicNLI show that Aristotle consistently outperforms state-of-the-art baselines, with especially large gains on complex logical tasks and demonstrated generalizability across models such as GPT-4, GPT-4o, Claude, and LLaMA. The findings highlight near-perfect one-step reasoning accuracy in the Resolver, reduced search errors, and stable cost scaling with reasoning depth, underscoring the practical impact of integrating symbolic logic into end-to-end LLM reasoning. The work also provides open-source code to encourage adoption and further research in reliable, scalable logical reasoning with LLMs.

Abstract

In the context of large language models (LLMs), current advanced reasoning methods have made impressive strides in various reasoning tasks. However, when it comes to logical reasoning tasks, major challenges remain in both efficacy and efficiency. This is rooted in the fact that these systems fail to fully leverage the inherent structure of logical tasks throughout the reasoning processes such as decomposition, search, and resolution. To address this, we propose a logic-complete reasoning framework, Aristotle, with three key components: Logical Decomposer, Logical Search Router, and Logical Resolver. In our framework, symbolic expressions and logical rules are comprehensively integrated into the entire reasoning process, significantly alleviating the bottlenecks of logical reasoning, i.e., reducing sub-task complexity, minimizing search errors, and resolving logical contradictions. The experimental results on several datasets demonstrate that Aristotle consistently outperforms state-of-the-art reasoning frameworks in both accuracy and efficiency, particularly excelling in complex logical reasoning scenarios. We will open-source all our code at https://llm-symbol.github.io/Aristotle/.

Paper Structure

This paper contains 61 sections, 8 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Our reasoning framework vs. the SoTA ToT: comparison in terms of Search Error (SE) and single-step Reasoning Error (RE), as well as in terms of average number of reasoning steps.
  • Figure 2: Our Aristotle logical reasoning framework (best viewed via zooming in). In step 1, the Translator and Decomposer together transform $P$ and $S$ into $P_n$ and $S_n$. Then, we initialize the $C_{\text{current}}$ using the decomposed $S_n$ and $\neg S_n$. In step 2, the Search Router uses the $C_\text{current}$ and $P_n$ to search for $C_\text{complement}$. The Resolver then resolves $C_\text{current}$ with $C_\text{complement}$ to produce $C_\text{resolved}$. The reasoning complete if: (1) the $C_\text{resolved}$ determines whether a contradiction exists; (2) reach the maximum number of iterations $I_\text{max}$. In step 3, Aristotle then concludes the Proof $D_{S_n}$ and $D_{\neg S_n}$ based on the Proof Determination. Using these proofs, Aristotle determines the final answer based on Eq. (\ref{['eq:logical_classification']}). Note that two distinct reasoning paths will be implemented: a solid box representing the path starting from $\neg S_n$, and a dotted box representing the path starting from $S_n$. The complete reasoning process for both two paths, including all iterations are shown in the right part "Reasoning Trajectories".
  • Figure 3: Ablation results (w/ GPT-4o).
  • Figure 4: Accuracy vs. Efficiency on ProofWriter using GPT-4. Efficiency is measured as the average number of visited nodes/steps required to solve the problem. The upper-left corner is the optimal point, representing the best performance with the fewest visited nodes.
  • Figure 5: One-step reasoning accuracy using GPT-4o.
  • ...and 3 more figures