GraphMind: Theorem Selection and Conclusion Generation Framework with Dynamic GNN for LLM Reasoning
Yutong Li, Yitian Zhou, Xudong Wang, GuoChen, Caiyan Qin
TL;DR
GraphMind tackles the challenge of evolving intermediate reasoning in LLM-based multi-step deduction by modeling the process as a dynamic heterogeneous graph. It tightly couples a relational GNN for state encoding with a semantic theorem matcher and an LLM that generates conclusions, all in a closed-loop that expands the graph at each step. The approach yields consistent improvements over strong prompting baselines across mathematics, finance, and law QA tasks, and ablations confirm the critical role of the GNN in capturing inter-premise dependencies. This framework offers a principled path toward interpretable, context-aware, and scalable reasoning for complex deductive tasks. The work has potential impact on formal reasoning, mathematical proof construction, and domain-specific AI assistants that require structured, verifiable reasoning traces.
Abstract
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, including multi-step reasoning such as mathematical proving. However, existing approaches often lack an explicit and dynamic mechanism to structurally represent and evolve intermediate reasoning states, which limits their ability to perform context-aware theorem selection and iterative conclusion generation. To address these challenges, we propose GraphMind, a novel dynamic graph-based framework that integrates the graph neural network (GNN) with LLMs to iteratively select theorems and generate intermediate conclusions for multi-step reasoning. Our method models the reasoning process as a heterogeneous evolving graph, where nodes represent conditions, theorems, and conclusions, while edges capture logical dependencies between nodes. By encoding the current reasoning state with GNN and leveraging semantic matching for theorem selection, our framework enables context-aware, interpretable, and structured reasoning in a closed-loop manner. Experiments on various question-answering (QA) datasets demonstrate that our proposed GraphMind method achieves consistent performance improvements and significantly outperforms existing baselines in multi-step reasoning, validating the effectiveness and generalizability of our approach.
