Table of Contents
Fetching ...

Uncertainty Quantification in Large Language Models Through Convex Hull Analysis

Ferhat Ozgur Catak, Murat Kuzlu

TL;DR

Uncertainty quantification for LLMs is critical in high-stakes settings, and traditional probabilistic/ensemble methods struggle with high-dimensional text outputs. The paper proposes a convex hull–based geometric approach that embeds responses with BERT, reduces to 2D via PCA, clusters with DBSCAN, and uses convex hull areas to quantify dispersion. Experiments show that the uncertainty metric $A(p,t)$ depends on prompt type, model, and temperature, with confusing prompts and higher temperatures producing larger dispersion. The method yields an interpretable geometry signal that differentiates models and prompt complexities and can complement existing evaluation criteria.

Abstract

Uncertainty quantification approaches have been more critical in large language models (LLMs), particularly high-risk applications requiring reliable outputs. However, traditional methods for uncertainty quantification, such as probabilistic models and ensemble techniques, face challenges when applied to the complex and high-dimensional nature of LLM-generated outputs. This study proposes a novel geometric approach to uncertainty quantification using convex hull analysis. The proposed method leverages the spatial properties of response embeddings to measure the dispersion and variability of model outputs. The prompts are categorized into three types, i.e., `easy', `moderate', and `confusing', to generate multiple responses using different LLMs at varying temperature settings. The responses are transformed into high-dimensional embeddings via a BERT model and subsequently projected into a two-dimensional space using Principal Component Analysis (PCA). The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is utilized to cluster the embeddings and compute the convex hull for each selected cluster. The experimental results indicate that the uncertainty of the model for LLMs depends on the prompt complexity, the model, and the temperature setting.

Uncertainty Quantification in Large Language Models Through Convex Hull Analysis

TL;DR

Uncertainty quantification for LLMs is critical in high-stakes settings, and traditional probabilistic/ensemble methods struggle with high-dimensional text outputs. The paper proposes a convex hull–based geometric approach that embeds responses with BERT, reduces to 2D via PCA, clusters with DBSCAN, and uses convex hull areas to quantify dispersion. Experiments show that the uncertainty metric depends on prompt type, model, and temperature, with confusing prompts and higher temperatures producing larger dispersion. The method yields an interpretable geometry signal that differentiates models and prompt complexities and can complement existing evaluation criteria.

Abstract

Uncertainty quantification approaches have been more critical in large language models (LLMs), particularly high-risk applications requiring reliable outputs. However, traditional methods for uncertainty quantification, such as probabilistic models and ensemble techniques, face challenges when applied to the complex and high-dimensional nature of LLM-generated outputs. This study proposes a novel geometric approach to uncertainty quantification using convex hull analysis. The proposed method leverages the spatial properties of response embeddings to measure the dispersion and variability of model outputs. The prompts are categorized into three types, i.e., `easy', `moderate', and `confusing', to generate multiple responses using different LLMs at varying temperature settings. The responses are transformed into high-dimensional embeddings via a BERT model and subsequently projected into a two-dimensional space using Principal Component Analysis (PCA). The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is utilized to cluster the embeddings and compute the convex hull for each selected cluster. The experimental results indicate that the uncertainty of the model for LLMs depends on the prompt complexity, the model, and the temperature setting.
Paper Structure (11 sections, 5 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 11 sections, 5 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: The system overview for calculating uncertainty in LLM responses.
  • Figure 2: (a) Convex hull analysis for a prompt with a single cluster, indicating low variability and greater certainty in the model's responses. (b) Convex hull analysis for a prompt with several clusters, indicating high variability and uncertainty in the model's responses.
  • Figure 3: The relationship between uncertainty and temperature settings based convex hull-based analysis for (a) easy, (b) moderate, and (c) confusing prompts of GPT-3.5-turbo, GPT-4o, and Gemini-pro outputs