Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models

Haoyi Song; Ruihan Ji; Naichen Shi; Fan Lai; Raed Al Kontar

Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models

Haoyi Song, Ruihan Ji, Naichen Shi, Fan Lai, Raed Al Kontar

TL;DR

This work tackles the challenge of uncertainty quantification for large language models, where token-probabilities and simple perturbations often fail to reflect true uncertainty. It introduces Inv-Entropy, a fully probabilistic measure derived from a dual random-walk model that links perturbed inputs to outputs via embeddings and similarity-based transitions, and uses bootstrapping to estimate $H(X|Y)$. The framework is augmented with GAAP, a genetic-algorithm-based perturbation method, and a new evaluation metric TSU to assess uncertainty without ground-truth correctness. Empirically, Inv-Entropy achieves state-of-the-art performance across multiple QA, knowledge, and math tasks on both black-box and gray-box LLMs, underscoring the framework’s flexibility and practical impact for reliable AI deployment.

Abstract

Large language models (LLMs) have transformed natural language processing, but their reliable deployment requires effective uncertainty quantification (UQ). Existing UQ methods are often heuristic and lack a probabilistic interpretation. This paper begins by providing a theoretical justification for the role of perturbations in UQ for LLMs. We then introduce a dual random walk perspective, modeling input-output pairs as two Markov chains with transition probabilities defined by semantic similarity. Building on this, we propose a fully probabilistic framework based on an inverse model, which quantifies uncertainty by evaluating the diversity of the input space conditioned on a given output through systematic perturbations. Within this framework, we define a new uncertainty measure, Inv-Entropy. A key strength of our framework is its flexibility: it supports various definitions of uncertainty measures, embeddings, perturbation strategies, and similarity metrics. We also propose GAAP, a perturbation algorithm based on genetic algorithms, which enhances the diversity of sampled inputs. In addition, we introduce a new evaluation metric, Temperature Sensitivity of Uncertainty (TSU), which directly assesses uncertainty without relying on correctness as a proxy. Extensive experiments demonstrate that Inv-Entropy outperforms existing semantic UQ methods. The code to reproduce the results can be found at https://github.com/UMDataScienceLab/Uncertainty-Quantification-for-LLMs.

Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models

TL;DR

Abstract

Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (1)