Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition
Lei Xu, Pierre Beckmann, Marco Valentino, André Freitas
TL;DR
Addresses static solver integration in neuro-symbolic systems by proposing an adaptive framework that decomposes natural-language problems into subproblems with associated reasoning types, routes them to a portfolio of formal solvers via autoformalization, and aggregates results. The end-to-end system $\mathcal{F}:\mathcal{X}\to\mathcal{A}$ comprises problem decomposition, routing, and solver-based reasoning, enabling dynamic, multi-paradigm inference. Empirical evaluation across five benchmarks shows high routing accuracy ($>98\%$) and strong gains (e.g., 92.1\% on the Mixed dataset) with frontier models, while smaller models benefit from fine-tuning but remain below frontier baselines. Limitations include model-scale dependencies, autoformalization bottlenecks, and limited solver coverage, pointing to future work on scaling, data-efficient formalization, and expanding the solver repertoire.
Abstract
Neuro-symbolic NLP methods aim to leverage the complementary strengths of large language models and formal logical solvers. However, current approaches are mostly static in nature, i.e., the integration of a target solver is predetermined at design time, hindering the ability to employ diverse formal inference strategies. To address this, we introduce an adaptive, multi-paradigm, neuro-symbolic inference framework that: (1) automatically identifies formal reasoning strategies from problems expressed in natural language; and (2) dynamically selects and applies specialized formal logical solvers via autoformalization interfaces. Extensive experiments on individual and multi-paradigm reasoning tasks support the following conclusions: LLMs are effective at predicting the necessary formal reasoning strategies with an accuracy above 90 percent. This enables flexible integration with formal logical solvers, resulting in our framework outperforming competing baselines by 27 percent and 6 percent compared to GPT-4o and DeepSeek-V3.1, respectively. Moreover, adaptive reasoning can even positively impact pure LLM methods, yielding gains of 10, 5, and 6 percent on zero-shot, CoT, and symbolic CoT settings with GPT-4o. Finally, although smaller models struggle with adaptive neuro-symbolic reasoning, post-training offers a viable path to improvement. Overall, this work establishes the foundations for adaptive LLM-symbolic reasoning, offering a path forward for unifying material and formal inferences on heterogeneous reasoning challenges.
