Improved Logical Reasoning of Language Models via Differentiable Symbolic Programming
Hanlin Zhang, Jiani Huang, Ziyang Li, Mayur Naik, Eric Xing
TL;DR
<3-5 sentence high-level summary> The paper tackles the poor logical reasoning capabilities of pre-trained LMs by introducing DSR-LM, a differentiable neuro-symbolic framework in which a perception LM extracts probabilistic relations and a differentiable symbolic engine performs deductive reasoning with learned rules. It adds semantic loss via integrity constraints and trains rule weights jointly with the LM, enabling end-to-end optimization and interpretable rule induction. Empirical results on CLUTRR and DBpedia-INF show substantial gains in deductive accuracy and stronger generalization to long reasoning chains, outperforming a broad set of baselines including GPT-3 variants. The approach demonstrates the value of integrating differentiable symbolic programming with neural perception to improve robustness and interpretability in reasoning tasks.
Abstract
Pre-trained large language models (LMs) struggle to perform logical reasoning reliably despite advances in scale and compositionality. In this work, we tackle this challenge through the lens of symbolic programming. We propose DSR-LM, a Differentiable Symbolic Reasoning framework where pre-trained LMs govern the perception of factual knowledge, and a symbolic module performs deductive reasoning. In contrast to works that rely on hand-crafted logic rules, our differentiable symbolic reasoning framework efficiently learns weighted rules and applies semantic loss to further improve LMs. DSR-LM is scalable, interpretable, and allows easy integration of prior knowledge, thereby supporting extensive symbolic programming to robustly derive a logical conclusion. The results of our experiments suggest that DSR-LM improves the logical reasoning abilities of pre-trained language models, resulting in a significant increase in accuracy of over 20% on deductive reasoning benchmarks. Furthermore, DSR-LM outperforms a variety of competitive baselines when faced with systematic changes in sequence length.
