MCTS-KBQA: Monte Carlo Tree Search for Knowledge Base Question Answering
Guanming Xiong, Haochen Li, Wen Zhao
TL;DR
The paper presents MCTS-KBQA, a Monte Carlo Tree Search framework designed to enhance reasoning in knowledge-base question answering by using step-wise rewards derived from open-source instruction-tuned LLMs. It integrates a tailored action space for KB reasoning, discriminative evaluation of intermediate states, depth-aware backpropagation, and early stopping with leaf voting, while also extending KBQA datasets via distant supervision to provide richer reasoning traces. Empirical results on WebQSP, CWQ, and KQA Pro demonstrate clear gains over linear decision-making baselines, with substantial data-efficiency advantages on the extended dataset. The work further analyzes reward design, hyper-parameter effects, and LLM choices, offering practical guidance for building scalable, reasoning-aware KBQA systems.
Abstract
This study explores how to enhance the reasoning capabilities of large language models (LLMs) in knowledge base question answering (KBQA) by leveraging Monte Carlo Tree Search (MCTS). Semantic parsing-based KBQA methods are particularly challenging as these approaches require locating elements from knowledge bases and generating logical forms, demanding not only extensive annotated data but also strong reasoning capabilities. Although recent approaches leveraging LLMs as agents have demonstrated considerable potential, these studies are inherently constrained by their linear decision-making processes. To address this limitation, we propose a MCTS-based framework that enhances LLMs' reasoning capabilities through tree search methodology. We design a carefully designed step-wise reward mechanism that requires only direct prompting of open-source instruction LLMs without additional fine-tuning. Experimental results demonstrate that our approach significantly outperforms linear decision-making methods, particularly in low-resource scenarios. Additionally, we contribute new data resources to the KBQA community by annotating intermediate reasoning processes for existing question-SPARQL datasets using distant supervision. Experimental results on the extended dataset demonstrate that our method achieves comparable performance to fully supervised models while using significantly less training data.
