Table of Contents
Fetching ...

NEX: Neuron Explore-Exploit Scoring for Label-Free Chain-of-Thought Selection and Model Ranking

Kang Chen, Zhuoka Feng, Sihan Zhao, Kai Xiong, Junjie Nian, Yaoning Wang, Changyi Xiao, Yixin Cao

TL;DR

NEX is proposed, a white-box label-free unsupervised scoring framework that views reasoning as alternating E-phase (exploration) and X-phase (exploitation), and shows entropy-based exploration proxies follow an inverted-U with accuracy, suggesting extra exploration can become redundant and induce overthinking.

Abstract

Large language models increasingly spend inference compute sampling multiple chain-of-thought traces or searching over merged checkpoints. This shifts the bottleneck from generation to selection, often without supervision on the target distribution. We show entropy-based exploration proxies follow an inverted-U with accuracy, suggesting extra exploration can become redundant and induce overthinking. We propose NEX, a white-box label-free unsupervised scoring framework that views reasoning as alternating E-phase (exploration) and X-phase (exploitation). NEX detects E-phase as spikes in newly activated MLP neurons per token from sparse activation caches, then uses a sticky two-state HMM to infer E-X phases and credits E-introduced neurons by whether they are reused in the following X span. These signals yield interpretable neuron weights and a single Good-Mass Fraction score to rank candidate responses and merged variants without task answers. Across reasoning benchmarks and Qwen3 merge families, NEX computed on a small unlabeled activation set predicts downstream accuracy and identifies better variants; we further validate the E-X signal with human annotations and provide causal evidence via "Effective-vs-Redundant" neuron transfer.

NEX: Neuron Explore-Exploit Scoring for Label-Free Chain-of-Thought Selection and Model Ranking

TL;DR

NEX is proposed, a white-box label-free unsupervised scoring framework that views reasoning as alternating E-phase (exploration) and X-phase (exploitation), and shows entropy-based exploration proxies follow an inverted-U with accuracy, suggesting extra exploration can become redundant and induce overthinking.

Abstract

Large language models increasingly spend inference compute sampling multiple chain-of-thought traces or searching over merged checkpoints. This shifts the bottleneck from generation to selection, often without supervision on the target distribution. We show entropy-based exploration proxies follow an inverted-U with accuracy, suggesting extra exploration can become redundant and induce overthinking. We propose NEX, a white-box label-free unsupervised scoring framework that views reasoning as alternating E-phase (exploration) and X-phase (exploitation). NEX detects E-phase as spikes in newly activated MLP neurons per token from sparse activation caches, then uses a sticky two-state HMM to infer E-X phases and credits E-introduced neurons by whether they are reused in the following X span. These signals yield interpretable neuron weights and a single Good-Mass Fraction score to rank candidate responses and merged variants without task answers. Across reasoning benchmarks and Qwen3 merge families, NEX computed on a small unlabeled activation set predicts downstream accuracy and identifies better variants; we further validate the E-X signal with human annotations and provide causal evidence via "Effective-vs-Redundant" neuron transfer.
Paper Structure (84 sections, 16 equations, 9 figures, 5 tables)

This paper contains 84 sections, 16 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: (a) Inverted-U relationship between accuracy and an entropy-only exploration proxy (average number of high entropy rows). Across benchmarks and merged model variants, exploration is beneficial up to a point, after which additional high entropy reasoning correlates with reduced accuracy. (b) Strong positive linear correlation between accuracy and the proposed NEX score across four benchmarks. The consistent linear fit demonstrates that unlike raw entropy, NEX provides a monotonic signal where higher scores robustly predict better reasoning performance (c) Schematic of the token--neuron dynamics curve: high-slope segments (red) correspond to E-phase where the model recruits new neurons to branch into hypotheses, while low-slope segments (blue) correspond to X-phase where the model reuses existing circuits to execute calculations.
  • Figure 2: Overview of the NEX algorithm. Left: Calculation of novelty-slope time series and E-X segmentation via sticky HMM. Middle: Progress via new-neuron reuse, consolidation (slope drop) and strength gating during E$\rightarrow$X cycles. Right: Scoring the neurons with normalized weights and the responses via neuron-weighted activation mass.
  • Figure 3: Average number of E-phase segments across model families. Left: Merged Qwen3-4B models with SLERP between Instruct and Thinking show smooth quadratic growth. Right: Trained Qwen3-4B-Base models by varying Instruct-Thinking data mixtures exhibit moderate growth with higher variance.
  • Figure 4: Exploration-accuracy relationship across 19 Qwen3-4B models. Left: Mean E-phase segments vs. accuracy exhibits an inverted-U ($R^2 = 0.90$), with optimal performance at ${\sim}11.6$ segments. Right:NEX score vs. accuracy shows strong positive correlation ($r = 0.886$), demonstrating that neuron weighting linearizes the relationship. Markers: merged models (blue circles), Instruct baseline (magenta square), Thinking baseline (orange triangle), best model (star).
  • Figure 5: Sample efficiency of NEX for model selection. We train NEX weights on $N$ problems and compute the correlation between NEX scores and accuracies across all merged models. Correlation (blue, left) saturates rapidly; Regret@1 (orange, right) declines steadily. Scatter: means over 10 seeds; curves: exponential saturation fits.
  • ...and 4 more figures