ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning
Zijian Wang, Chang Xu
TL;DR
This work identifies intrinsic reasoning in pre-trained LLMs as linearly separable in activation space, enabling a simple linear classifier to detect thoughtful responses. Building on this, ThoughtProbe performs classifier-guided beam search over a tree-like thought space, using classifier logits as rewards and a branch-aggregation scheme to select optimal answers. The approach yields significant improvements on several arithmetic reasoning benchmarks across multiple LLMs, demonstrating robust cross-model gains and scalable inference-time reasoning without human prompts. By combining linear probing, guided exploration, and value aggregation, ThoughtProbe provides a practical framework to harness intrinsic reasoning in LLMs with broad applicability to multi-step problem solving.
Abstract
Pre-trained large language models (LLMs) have been demonstrated to possess intrinsic reasoning capabilities that can emerge naturally when expanding the response space. However, the neural representation mechanisms underlying these intrinsic capabilities and approaches for their optimal utilization remain inadequately understood. In this work, we make the key discovery that a simple linear classifier can effectively detect intrinsic reasoning capabilities in LLMs' activation space, particularly within specific representation types and network layers. Based on this finding, we propose a classifier-guided search framework that strategically explore a tree-structured response space. In each node expansion, the classifier serves as a scoring and ranking mechanism that efficiently allocates computational resources by identifying and prioritizing more thoughtful reasoning directions for continuation. After completing the tree expansion, we collect answers from all branches to form a candidate answer pool. We propose a branch-aggregation selection method that marginalizes over all supporting branches by aggregating their thoughtfulness scores, thereby identifying the optimal answer from the pool. Experimental results show that our framework's comprehensive exploration not only covers valid reasoning chains but also effectively identifies them, achieving significant improvements across multiple arithmetic reasoning benchmarks.
