Blind Spot Navigation in Large Language Model Reasoning with Thought Space Explorer
Jinghan Zhang, Fengran Mo, Tharindu Cyril Weerasooriya, Xinyue Ye, Dongjie Wang, Yanjie Fu, Kunpeng Liu
TL;DR
The paper tackles blind spots in large language model reasoning by introducing Thought Space Explorer (TSE), a framework to navigate and expand thought structures. TSE identifies high-impact thought nodes, generates new nodes by fusing information from multiple chains, and expands branches to explore previously unconsidered solution regions, including gradient-based and gradient-agnostic variants. It introduces both a gradient-based collaborative scoring mechanism and a gradient-agnostic prompting approach (self-prompting, semantic relevance, LLM-as-a-Judge) to select and merge diverse reasoning paths. Empirical results on GSM8K, AIME, and GPQA-D show that TSE improves final answer accuracy, path correctness, and reasoning diversity, while maintaining a practical token-cost trade-off, highlighting its potential for deployment in real-world reasoning tasks.
Abstract
Large language models have shown strong reasoning capabilities through chain-structured methods such as Chain-of-Thought. Recent studies optimize thought structures by generating parallel or tree-like structures, switching between long and short reasoning modes, or aligning reasoning steps with task performance. However, these approaches mainly rely on previously generated logical directions of the chains, which ignore the unexplored regions of the solution space. Such a phenomenon is defined as blind spots, which limit the diversity and effectiveness of the reasoning process. To this end, we propose the ``Thought Space Explorer'' (TSE), a framework for navigating and expanding thought structures to overcome blind spots in LLM reasoning. Our TSE first identifies key nodes with high impact, then generates new nodes by integrating information from multiple chains. Finally, it extends new branches through connection strategies. We conduct a series of experiments on math and QA benchmarks. Compared with existing baseline methods, TSE improves the accuracy of both the final answer and intermediate reasoning steps, while maintaining a better effectiveness-efficiency trade-off for practical deployment.
