Bi-Chainer: Automated Large Language Models Reasoning with Bidirectional Chaining
Shuqi Liu, Bowei He, Linqi Song
TL;DR
Bi-Chainer tackles the challenge of complex, multi-step logical reasoning in large language models by introducing bidirectional chaining that dynamically switches between forward and backward reasoning in a depth-first manner. It integrates six LLM-driven modules to identify facts, select rules, deduce or abduct, check facts, and monitor confusion, using guidance from the opposite reasoning direction to reduce branching and inference calls. Across four challenging datasets (ProofWriter, FOLIO, AR-LSAT, ParaRules), Bi-Chainer achieves sizable improvements in label accuracy and, crucially, near-perfect proof accuracy (up to 98%), while significantly reducing the number of LLM calls compared with forward-only or backward-only baselines. The approach demonstrates stronger reasoning reliability and efficiency, with additional validation on open-source models and a detailed case study illustrating how bidirectional guidance mitigates premise confusion. These findings suggest bidirectional, depth-first reasoning as a practical path to more capable and efficient automated reasoning in LLMs.
Abstract
Large Language Models (LLMs) have shown human-like reasoning abilities but still face challenges in solving complex logical problems. Existing unidirectional chaining methods, such as forward chaining and backward chaining, suffer from issues like low prediction accuracy and efficiency. To address these, we propose a bidirectional chaining method, Bi-Chainer, which dynamically switches to depth-first reasoning in the opposite reasoning direction when it encounters multiple branching options within the current direction. Thus, the intermediate reasoning results can be utilized as guidance to facilitate the reasoning process. We show that Bi-Chainer achieves sizable accuracy boots over unidirectional chaining frameworks on four challenging logical reasoning datasets. Moreover, Bi-Chainer enhances the accuracy of intermediate proof steps and reduces the average number of inference calls, resulting in more efficient and accurate reasoning.
