Table of Contents
Fetching ...

ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning

Zijian Wang, Chang Xu

TL;DR

This work identifies intrinsic reasoning in pre-trained LLMs as linearly separable in activation space, enabling a simple linear classifier to detect thoughtful responses. Building on this, ThoughtProbe performs classifier-guided beam search over a tree-like thought space, using classifier logits as rewards and a branch-aggregation scheme to select optimal answers. The approach yields significant improvements on several arithmetic reasoning benchmarks across multiple LLMs, demonstrating robust cross-model gains and scalable inference-time reasoning without human prompts. By combining linear probing, guided exploration, and value aggregation, ThoughtProbe provides a practical framework to harness intrinsic reasoning in LLMs with broad applicability to multi-step problem solving.

Abstract

Pre-trained large language models (LLMs) have been demonstrated to possess intrinsic reasoning capabilities that can emerge naturally when expanding the response space. However, the neural representation mechanisms underlying these intrinsic capabilities and approaches for their optimal utilization remain inadequately understood. In this work, we make the key discovery that a simple linear classifier can effectively detect intrinsic reasoning capabilities in LLMs' activation space, particularly within specific representation types and network layers. Based on this finding, we propose a classifier-guided search framework that strategically explore a tree-structured response space. In each node expansion, the classifier serves as a scoring and ranking mechanism that efficiently allocates computational resources by identifying and prioritizing more thoughtful reasoning directions for continuation. After completing the tree expansion, we collect answers from all branches to form a candidate answer pool. We propose a branch-aggregation selection method that marginalizes over all supporting branches by aggregating their thoughtfulness scores, thereby identifying the optimal answer from the pool. Experimental results show that our framework's comprehensive exploration not only covers valid reasoning chains but also effectively identifies them, achieving significant improvements across multiple arithmetic reasoning benchmarks.

ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning

TL;DR

This work identifies intrinsic reasoning in pre-trained LLMs as linearly separable in activation space, enabling a simple linear classifier to detect thoughtful responses. Building on this, ThoughtProbe performs classifier-guided beam search over a tree-like thought space, using classifier logits as rewards and a branch-aggregation scheme to select optimal answers. The approach yields significant improvements on several arithmetic reasoning benchmarks across multiple LLMs, demonstrating robust cross-model gains and scalable inference-time reasoning without human prompts. By combining linear probing, guided exploration, and value aggregation, ThoughtProbe provides a practical framework to harness intrinsic reasoning in LLMs with broad applicability to multi-step problem solving.

Abstract

Pre-trained large language models (LLMs) have been demonstrated to possess intrinsic reasoning capabilities that can emerge naturally when expanding the response space. However, the neural representation mechanisms underlying these intrinsic capabilities and approaches for their optimal utilization remain inadequately understood. In this work, we make the key discovery that a simple linear classifier can effectively detect intrinsic reasoning capabilities in LLMs' activation space, particularly within specific representation types and network layers. Based on this finding, we propose a classifier-guided search framework that strategically explore a tree-structured response space. In each node expansion, the classifier serves as a scoring and ranking mechanism that efficiently allocates computational resources by identifying and prioritizing more thoughtful reasoning directions for continuation. After completing the tree expansion, we collect answers from all branches to form a candidate answer pool. We propose a branch-aggregation selection method that marginalizes over all supporting branches by aggregating their thoughtfulness scores, thereby identifying the optimal answer from the pool. Experimental results show that our framework's comprehensive exploration not only covers valid reasoning chains but also effectively identifies them, achieving significant improvements across multiple arithmetic reasoning benchmarks.

Paper Structure

This paper contains 25 sections, 4 theorems, 12 equations, 13 figures, 3 tables.

Key Result

Theorem 3.1

Let $l(x)$ be the logit value of a binary classifier trained on preference data derived from Bradley-Terry model bradley1952rank, where preference pairs are treated as binary classification data. Let $r(x)$ be the reward function in the original Bradley-Terry model. The following preference ordering (See appendix for proof)

Figures (13)

  • Figure 1: Pre-trained LLMs could naturally generate both thoughtful and non-thoughtful responses when sampling multiple times, with these responses being linearly separable in the activation space.
  • Figure 2: Classification performance (F1-Score and AUC-ROC) of linear classifiers across different representation types and LLMs.
  • Figure 3: Mean logit values and variance regions along the token sequence in Phi-1.5. Left: Comparison between lengthy thoughtful correct responses and concise incorrect intuitive responses. Right: Comparison between lengthy thoughtful correct responses and lengthy incorrect responses.
  • Figure 4: Our classifier-guided tree exploration framework. At each parent node, multiple candidates are sampled and evaluated by a pre-trained classifier in activation space. Nodes are selected for further expansion based on thoughtfulness scores. Each exploration branch produces a candidate answer, forming an answer pool from which the final answer is determined through marginalization across all branches.
  • Figure 5: The accuracy plot with different choice of expansion depth and beam width.
  • ...and 8 more figures

Theorems & Definitions (6)

  • Theorem 3.1: Logit-Reward Order Preservation
  • Lemma 1.1: Classification-Preference Connection
  • proof
  • Theorem 1.2: Classification-Reward Equivalence
  • proof
  • Corollary 1.3: Order Preservation