Exploring the Role of Reasoning Structures for Constructing Proofs in Multi-Step Natural Language Reasoning with Large Language Models
Zi'ou Zheng, Christopher Malon, Martin Renqiang Min, Xiaodan Zhu
TL;DR
The paper investigates how large language models can be guided to construct structured proof graphs for complex, multi-step natural language reasoning via in-context learning. It introduces a six-component framework—Structure-aware Demonstration, Candidate Retrieval, Reasoning Step Proposal, Reasoning Step Evaluation, Proof Hint Generation, and Structure-aware Pruning—coupled with a BFS-beam search and diversity pruning to navigate structured proofs. Experiments on EntailmentBank, AR-LSAT, and PrOntoQA across GPT-3.5/4 and open-source LLMs show that incorporating proof-structure awareness improves evidence accuracy and graph similarity, with notable gains in non-sequential reasoning. The work highlights the practical benefits for explainability and reasoning reliability, while acknowledging costs and limitations such as token usage and the current focus on English, single-domain natural language reasoning.
Abstract
When performing complex multi-step reasoning tasks, the ability of Large Language Models (LLMs) to derive structured intermediate proof steps is important for ensuring that the models truly perform the desired reasoning and for improving models' explainability. This paper is centred around a focused study: whether the current state-of-the-art generalist LLMs can leverage the structures in a few examples to better construct the proof structures with \textit{in-context learning}. Our study specifically focuses on structure-aware demonstration and structure-aware pruning. We demonstrate that they both help improve performance. A detailed analysis is provided to help understand the results.
