Table of Contents
Fetching ...

Hybrid Models for Natural Language Reasoning: The Case of Syllogistic Logic

Manuel Vargas Guzmán, Jakub Szymanik, Maciej Malicki

TL;DR

The study separates compositionality and recursiveness as core generalization challenges in neural reasoning, showing LLMs retain recursive capabilities but struggle with compositional generalization on syllogistic tasks. It proposes a hybrid architecture that uses neural components to propose premises and contradictions to guide a symbolic prover, achieving substantial efficiency gains while maintaining logical completeness. Across synthetic, controlled data, the approach demonstrates up to ~1000x reductions in search steps and highlights the complementary strengths of neural and symbolic methods. This work points to a practical path for reliable, scalable neural-symbolic reasoning and motivates extending the framework to richer logics and inference building blocks.

Abstract

Despite the remarkable progress in neural models, their ability to generalize, a cornerstone for applications like logical reasoning, remains a critical challenge. We delineate two fundamental aspects of this ability: compositionality, the capacity to abstract atomic logical rules underlying complex inferences, and recursiveness, the aptitude to build intricate representations through iterative application of inference rules. In the literature, these two aspects are often confounded together under the umbrella term of generalization. To sharpen this distinction, we investigated the logical generalization capabilities of pre-trained large language models (LLMs) using the syllogistic fragment as a benchmark for natural language reasoning. Though simple, this fragment provides a foundational yet expressive subset of formal logic that supports controlled evaluation of essential reasoning abilities. Our findings reveal a significant disparity: while LLMs demonstrate reasonable proficiency in recursiveness, they struggle with compositionality. To overcome these limitations and establish a reliable logical prover, we propose a hybrid architecture integrating symbolic reasoning with neural computation. This synergistic interaction enables robust and efficient inference, neural components accelerate processing, while symbolic reasoning ensures completeness. Our experiments show that high efficiency is preserved even with relatively small neural components. As part of our proposed methodology, this analysis gives a rationale and highlights the potential of hybrid models to effectively address key generalization barriers in neural reasoning systems.

Hybrid Models for Natural Language Reasoning: The Case of Syllogistic Logic

TL;DR

The study separates compositionality and recursiveness as core generalization challenges in neural reasoning, showing LLMs retain recursive capabilities but struggle with compositional generalization on syllogistic tasks. It proposes a hybrid architecture that uses neural components to propose premises and contradictions to guide a symbolic prover, achieving substantial efficiency gains while maintaining logical completeness. Across synthetic, controlled data, the approach demonstrates up to ~1000x reductions in search steps and highlights the complementary strengths of neural and symbolic methods. This work points to a practical path for reliable, scalable neural-symbolic reasoning and motivates extending the framework to richer logics and inference building blocks.

Abstract

Despite the remarkable progress in neural models, their ability to generalize, a cornerstone for applications like logical reasoning, remains a critical challenge. We delineate two fundamental aspects of this ability: compositionality, the capacity to abstract atomic logical rules underlying complex inferences, and recursiveness, the aptitude to build intricate representations through iterative application of inference rules. In the literature, these two aspects are often confounded together under the umbrella term of generalization. To sharpen this distinction, we investigated the logical generalization capabilities of pre-trained large language models (LLMs) using the syllogistic fragment as a benchmark for natural language reasoning. Though simple, this fragment provides a foundational yet expressive subset of formal logic that supports controlled evaluation of essential reasoning abilities. Our findings reveal a significant disparity: while LLMs demonstrate reasonable proficiency in recursiveness, they struggle with compositionality. To overcome these limitations and establish a reliable logical prover, we propose a hybrid architecture integrating symbolic reasoning with neural computation. This synergistic interaction enables robust and efficient inference, neural components accelerate processing, while symbolic reasoning ensures completeness. Our experiments show that high efficiency is preserved even with relatively small neural components. As part of our proposed methodology, this analysis gives a rationale and highlights the potential of hybrid models to effectively address key generalization barriers in neural reasoning systems.

Paper Structure

This paper contains 29 sections, 5 figures, 19 tables, 3 algorithms.

Figures (5)

  • Figure 1: Overview of the hybrid architecture. Input: a knowledge base $\mathcal{KB}$ and a hypothesis $H$. Hybrid Model: the neural models assist the symbolic prover by providing a subset $\mathcal{P} \subset \mathcal{KB}$ such that $\mathcal{P} \vdash H$, and a formula $F$ such that $\mathcal{KB} \cup \{ \overline{H} \} \vdash F \wedge \overline{F}$. Output: a proof $\triangledown$ of $H$ from $\mathcal{KB}$.
  • Figure 2: Example of a knowledge base $\mathcal{KB}$ represented as a graph along with valid inferences that can be derived from $\mathcal{KB}$.
  • Figure 3: Example of a conversion from a set of syllogistic formulas represented as a graph to a set of formulas in natural language (using pseudoword substitutions) and vice-versa.
  • Figure 4: Generalization performance of GPT and T5 architectures across the five shortest (compositional) and five longest (recursive) unseen A-chain lengths, denoted as $\sigma(t)$ and $\mu(t)$, respectively, for each syllogism type $t$.
  • Figure 5: Geometric mean and standard deviation of the number of steps for the Symbolic and Hybrid models, using different assistants trained on GPT and T5. OVE, COM, and REC denote overall, compositional, and recursive models, respectively.

Theorems & Definitions (3)

  • Definition 1: Inference
  • Definition 2: Proof
  • Definition 3: Minimal Inference