Table of Contents
Fetching ...

ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations

Zijian Wang, Chang Xu

TL;DR

ThoughtProbe presents an inference-time framework that leverages discriminative signals from LLM hidden representations to guide tree-structured reasoning exploration. It probes representations to train lightweight classifiers, uses classifier-guided beam search to generate diverse reasoning paths, and applies branch-aggregation to select final answers, all without fine-tuning. Across multiple arithmetic benchmarks and LLMs, it delivers substantial performance gains over prompting, sampling, and activation-steering baselines, demonstrating the practical viability of linear representation probing for robust reasoning. The work highlights a shift toward inference-time, representation-based guidance as a scalable approach to enhance LLM reasoning in real-world deployments.

Abstract

This paper introduces ThoughtProbe, a novel inference time framework that leverages the hidden reasoning features of Large Language Models (LLMs) to improve their reasoning performance. Unlike previous works that manipulate the hidden representations to steer LLM generation, we harness them as discriminative signals to guide the tree structured response space exploration. In each node expansion, a classifier serves as a scoring and ranking mechanism that efficiently allocates computational resources by prioritizing higher score candidates for continuation. After completing the tree expansion, we collect answers from all branches to form a candidate answer pool. We then propose a branch aggregation method that marginalizes over all supporting branches by aggregating their CoT scores, thereby identifying the optimal answer from the pool. Experimental results show that our framework's comprehensive exploration not only covers valid reasoning chains but also effectively identifies them, achieving significant improvements across multiple arithmetic reasoning benchmarks.

ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations

TL;DR

ThoughtProbe presents an inference-time framework that leverages discriminative signals from LLM hidden representations to guide tree-structured reasoning exploration. It probes representations to train lightweight classifiers, uses classifier-guided beam search to generate diverse reasoning paths, and applies branch-aggregation to select final answers, all without fine-tuning. Across multiple arithmetic benchmarks and LLMs, it delivers substantial performance gains over prompting, sampling, and activation-steering baselines, demonstrating the practical viability of linear representation probing for robust reasoning. The work highlights a shift toward inference-time, representation-based guidance as a scalable approach to enhance LLM reasoning in real-world deployments.

Abstract

This paper introduces ThoughtProbe, a novel inference time framework that leverages the hidden reasoning features of Large Language Models (LLMs) to improve their reasoning performance. Unlike previous works that manipulate the hidden representations to steer LLM generation, we harness them as discriminative signals to guide the tree structured response space exploration. In each node expansion, a classifier serves as a scoring and ranking mechanism that efficiently allocates computational resources by prioritizing higher score candidates for continuation. After completing the tree expansion, we collect answers from all branches to form a candidate answer pool. We then propose a branch aggregation method that marginalizes over all supporting branches by aggregating their CoT scores, thereby identifying the optimal answer from the pool. Experimental results show that our framework's comprehensive exploration not only covers valid reasoning chains but also effectively identifies them, achieving significant improvements across multiple arithmetic reasoning benchmarks.

Paper Structure

This paper contains 27 sections, 3 theorems, 13 equations, 13 figures, 4 tables.

Key Result

Lemma A.1

For any instance $x \in \mathcal{X}$:

Figures (13)

  • Figure 1: Pre-trained LLMs could naturally generate both CoT and non-CoT responses when sampling multiple times, and hidden representations provide a strong signal for discriminating them.
  • Figure 2: Layer-wise classification performance (F1-Score and AUC-ROC) across different representation types and LLMs.
  • Figure 3: Mean logit values and variance regions along the token sequence. Left: Comparison between CoT and non-CoT responses. Right: Comparison between correct and incorrect CoT responses.
  • Figure 4: Our classifier-guided tree exploration framework. At each parent node, multiple candidates are sampled and evaluated by a pre-trained classifier by probing representations. Nodes are selected for further expansion based on scores. Each exploration branch produces a candidate answer, forming an answer pool from which the final answer is determined through marginalization across all branches.
  • Figure 5: The accuracy plot when scaling the search space with different expansion depth and beam width.
  • ...and 8 more figures

Theorems & Definitions (6)

  • Lemma A.1: Classification-Preference Connection
  • proof
  • Theorem A.2: Logit implies reward ordering
  • proof
  • Theorem A.3: Logit is lower bounded by reward
  • proof