Table of Contents
Fetching ...

Robust Search with Uncertainty-Aware Value Models for Language Model Reasoning

Fei Yu, Yingru Li, Benyou Wang

TL;DR

The paper tackles verifier failures in value-model guided search for language-model reasoning by introducing uncertainty-aware value modelling. It presents Uncertainty-Aware Value Models (UVMs) that output posterior value distributions via the Ensemble++ framework and a Group Thompson Sampling strategy to select candidates based on their probability of being optimal. Empirical results across ID and OOD benchmarks show improved solution coverage, particularly under distribution shift, with a noted trade-off in precision under majority voting. The work demonstrates robust, uncertainty-aware search with modest overhead and provides code to facilitate adoption and further research in uncertainty quantification for LLM search.

Abstract

Value model guided search is effective in steering LLM generation but suffers from a lack of robustness. This is due to verifier failure: imperfect VMs mistakenly prune valid reasoning paths, especially when encountering unseen reasoning paths generated during search. To address this, we propose an uncertainty-aware framework with two key components: (1) Uncertainty-Aware Value Models (UVMs), which replace single-point value estimates with value distributions to quantify prediction reliability, and (2) Group Thompson Sampling, an efficient algorithm that selects candidates based on their probability of being optimal. Experiments on two In-Distribution (ID) settings (GSM8K, MATH) and three Out-Of-Distribution (OOD) settings (e.g., AIME25, Minerva Math) show our method significantly mitigates verifier failure and boosts solution coverage, especially on OOD problems. This work provides the first systematic integration of uncertainty quantification into LLM search paradigms, enhancing robustness. The code is released at https://github.com/FreedomIntelligence/UVM.

Robust Search with Uncertainty-Aware Value Models for Language Model Reasoning

TL;DR

The paper tackles verifier failures in value-model guided search for language-model reasoning by introducing uncertainty-aware value modelling. It presents Uncertainty-Aware Value Models (UVMs) that output posterior value distributions via the Ensemble++ framework and a Group Thompson Sampling strategy to select candidates based on their probability of being optimal. Empirical results across ID and OOD benchmarks show improved solution coverage, particularly under distribution shift, with a noted trade-off in precision under majority voting. The work demonstrates robust, uncertainty-aware search with modest overhead and provides code to facilitate adoption and further research in uncertainty quantification for LLM search.

Abstract

Value model guided search is effective in steering LLM generation but suffers from a lack of robustness. This is due to verifier failure: imperfect VMs mistakenly prune valid reasoning paths, especially when encountering unseen reasoning paths generated during search. To address this, we propose an uncertainty-aware framework with two key components: (1) Uncertainty-Aware Value Models (UVMs), which replace single-point value estimates with value distributions to quantify prediction reliability, and (2) Group Thompson Sampling, an efficient algorithm that selects candidates based on their probability of being optimal. Experiments on two In-Distribution (ID) settings (GSM8K, MATH) and three Out-Of-Distribution (OOD) settings (e.g., AIME25, Minerva Math) show our method significantly mitigates verifier failure and boosts solution coverage, especially on OOD problems. This work provides the first systematic integration of uncertainty quantification into LLM search paradigms, enhancing robustness. The code is released at https://github.com/FreedomIntelligence/UVM.

Paper Structure

This paper contains 53 sections, 8 equations, 1 figure, 5 tables, 2 algorithms.

Figures (1)

  • Figure 1: Illustration of the UVM structure, value learning process, and its relationship to OVM: (i) This figure shows how UVM extends OVM by adding uncertainty with minimal additional parameters. The blue branch represents OVM, which computes a mean value for a sequence. The orange branch introduces the uncertainty term in UVM, calculated using parameters $\mathbf{W}$ and $\mathbf{W}_0$. This uncertainty term varies with the input index $\boldsymbol{\zeta}$, leading to diverse posterior value samples. The process of using UVM is simple: UVM derives a mean value like OVM, but also samples from a fixed distribution and adds the uncertainty term. (ii)-(iii) For training, UVM uses the same training set as OVM, but samples 2m posterior values $[v_1,\dots,v_{2m}]$ using a discrete coordinate distribution $[\boldsymbol{e}_1,\dots,\boldsymbol{e}_{2m}]$, rather than estimating a single value. The model is trained by averaging the MSE over these posterior samples.