Distance Is All You Need: Radial Dispersion for Uncertainty Estimation in Large Language Models
Manh Nguyen, Sunil Gupta, Hung Le
TL;DR
This paper tackles the challenge of reliably estimating uncertainty in open-ended LLM outputs. It proposes Radial Dispersion Score, a simple, parameter-free metric computed from external embeddings that captures the spread of sampled generations on a unit sphere, with a probability-weighted variant that leverages token probabilities when available. The authors show that RDS and RDS_w outperform a broad set of baselines across multiple datasets and models, and that the per-sample scores enable effective best-of-N selection and confidence-based filtering. The approach is model-agnostic, scalable, and robust to embedding choices and sampling budgets, offering a practical tool for reducing hallucinations and improving decision-making with LLMs. Limitations include the need for an external encoder and multiple samples, particularly in black-box settings.
Abstract
Detecting when large language models (LLMs) are uncertain is critical for building reliable systems, yet existing methods are overly complicated, relying on brittle semantic clustering or internal states. We introduce \textbf{Radial Dispersion Score (RDS)}, a simple, parameter-free, fully model-agnostic uncertainty metric that measures the radial dispersion of sampled generations in embedding space. A lightweight probability-weighted variant further incorporates the model's own token probabilities when available, outperforming different nine strong baselines. Moroever, RDS naturally extends to per-sample scoring, enabling applications such as best-of-$N$ selection and confidence-based filtering. Across four challenging free-form QA datasets and multiple LLMs, our metrics achieve state-of-the-art hallucination detection and answer selection performance, while remaining robust and scalable with respect to sample size and embedding choice.
