Intermediate Languages Matter: Formal Choice Drives Neurosymbolic LLM Reasoning
Alexander Beiser, David Penz, Nysret Musliu
TL;DR
This paper investigates how the choice of the formal intermediate language influences neurosymbolic LLM reasoning. By comparing four formal languages (Pyke, ASP, NLTK, FOL) across three datasets (ProntoQA, ProofWriter, FOLIO) and six LLMs, the authors quantify the impact of syntax and encoding strategies via an extensive ablation study. They find that First-Order Logic (FOL) typically yields better reasoning performance than logic-programming languages, and that context-aware encodings improve accuracy, while comments or Markdown syntax offer no consistent benefit. The study formalizes the intermediate language challenge and demonstrates that the right formal language, together with thoughtful ICL-encoding, can significantly enhance neurosymbolic reasoning, with practical implications for deploying such systems in resource-constrained settings and guiding future research into additional languages and prompting strategies.
Abstract
Large language models (LLMs) achieve astonishing results on a wide range of tasks. However, their formal reasoning ability still lags behind. A promising approach is Neurosymbolic LLM reasoning. It works by using LLMs as translators from natural to formal languages and symbolic solvers for deriving correct results. Still, it remains unclear what the contributing factors to the success of Neurosymbolic LLM reasoning are. This paper shows that one important factor is the choice of the formal language. By comparing 4 formal languages on 3 datasets over 6 LLMs, we show that the choice of formal language affects both the syntactic and the semantic reasoning capability. Thereby, we introduce the intermediate language challenge, which is the challenge of picking a suitable formal language for neurosymbolic reasoning. Further, we compare the effects of using different in-context-learning examples in an ablation study. We conclude that on average, context-aware encodings help LLMs to reason, while there is no apparent effect of using comments or markdown syntax.
