Table of Contents
Fetching ...

Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition

Lei Xu, Pierre Beckmann, Marco Valentino, André Freitas

TL;DR

Addresses static solver integration in neuro-symbolic systems by proposing an adaptive framework that decomposes natural-language problems into subproblems with associated reasoning types, routes them to a portfolio of formal solvers via autoformalization, and aggregates results. The end-to-end system $\mathcal{F}:\mathcal{X}\to\mathcal{A}$ comprises problem decomposition, routing, and solver-based reasoning, enabling dynamic, multi-paradigm inference. Empirical evaluation across five benchmarks shows high routing accuracy ($>98\%$) and strong gains (e.g., 92.1\% on the Mixed dataset) with frontier models, while smaller models benefit from fine-tuning but remain below frontier baselines. Limitations include model-scale dependencies, autoformalization bottlenecks, and limited solver coverage, pointing to future work on scaling, data-efficient formalization, and expanding the solver repertoire.

Abstract

Neuro-symbolic NLP methods aim to leverage the complementary strengths of large language models and formal logical solvers. However, current approaches are mostly static in nature, i.e., the integration of a target solver is predetermined at design time, hindering the ability to employ diverse formal inference strategies. To address this, we introduce an adaptive, multi-paradigm, neuro-symbolic inference framework that: (1) automatically identifies formal reasoning strategies from problems expressed in natural language; and (2) dynamically selects and applies specialized formal logical solvers via autoformalization interfaces. Extensive experiments on individual and multi-paradigm reasoning tasks support the following conclusions: LLMs are effective at predicting the necessary formal reasoning strategies with an accuracy above 90 percent. This enables flexible integration with formal logical solvers, resulting in our framework outperforming competing baselines by 27 percent and 6 percent compared to GPT-4o and DeepSeek-V3.1, respectively. Moreover, adaptive reasoning can even positively impact pure LLM methods, yielding gains of 10, 5, and 6 percent on zero-shot, CoT, and symbolic CoT settings with GPT-4o. Finally, although smaller models struggle with adaptive neuro-symbolic reasoning, post-training offers a viable path to improvement. Overall, this work establishes the foundations for adaptive LLM-symbolic reasoning, offering a path forward for unifying material and formal inferences on heterogeneous reasoning challenges.

Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition

TL;DR

Addresses static solver integration in neuro-symbolic systems by proposing an adaptive framework that decomposes natural-language problems into subproblems with associated reasoning types, routes them to a portfolio of formal solvers via autoformalization, and aggregates results. The end-to-end system comprises problem decomposition, routing, and solver-based reasoning, enabling dynamic, multi-paradigm inference. Empirical evaluation across five benchmarks shows high routing accuracy () and strong gains (e.g., 92.1\% on the Mixed dataset) with frontier models, while smaller models benefit from fine-tuning but remain below frontier baselines. Limitations include model-scale dependencies, autoformalization bottlenecks, and limited solver coverage, pointing to future work on scaling, data-efficient formalization, and expanding the solver repertoire.

Abstract

Neuro-symbolic NLP methods aim to leverage the complementary strengths of large language models and formal logical solvers. However, current approaches are mostly static in nature, i.e., the integration of a target solver is predetermined at design time, hindering the ability to employ diverse formal inference strategies. To address this, we introduce an adaptive, multi-paradigm, neuro-symbolic inference framework that: (1) automatically identifies formal reasoning strategies from problems expressed in natural language; and (2) dynamically selects and applies specialized formal logical solvers via autoformalization interfaces. Extensive experiments on individual and multi-paradigm reasoning tasks support the following conclusions: LLMs are effective at predicting the necessary formal reasoning strategies with an accuracy above 90 percent. This enables flexible integration with formal logical solvers, resulting in our framework outperforming competing baselines by 27 percent and 6 percent compared to GPT-4o and DeepSeek-V3.1, respectively. Moreover, adaptive reasoning can even positively impact pure LLM methods, yielding gains of 10, 5, and 6 percent on zero-shot, CoT, and symbolic CoT settings with GPT-4o. Finally, although smaller models struggle with adaptive neuro-symbolic reasoning, post-training offers a viable path to improvement. Overall, this work establishes the foundations for adaptive LLM-symbolic reasoning, offering a path forward for unifying material and formal inferences on heterogeneous reasoning challenges.

Paper Structure

This paper contains 52 sections, 7 equations, 16 figures, 7 tables, 1 algorithm.

Figures (16)

  • Figure 1: Overview of the proposed adaptive symbolic reasoning framework. Given a natural language reasoning problem, the system first performs problem decomposition to extract structured components and identify the corresponding reasoning type (LP, FOL, CSP, SMT). Based on this analysis, a Router dynamically selects the appropriate solver for each problem and orchestrates the reasoning process. Each solver performs autoformalization on the structured input and conducts formal reasoning to produce verified answers. Unlike static approaches that predetermine solver integration, our framework adaptively composes specialized solvers through problem-aware classification and dynamic orchestration, enabling robust handling of heterogeneous reasoning tasks.
  • Figure 2: Model performance on Logic Deduction tasks of varying difficulty.
  • Figure 3: Distribution of error types of our framework for Logic Deduction tasks.
  • Figure 4: Distribution of error types of our framework. Each bar represents the number of incorrect predictions made by a model on a given dataset. Within each bar, different fill patterns indicate different error types.
  • Figure 5: Prompt for text parsing.
  • ...and 11 more figures