Uncertainty Quantification of Large Language Models through Multi-Dimensional Responses
Tiejin Chen, Xiaoou Liu, Longchao Da, Jia Chen, Vagelis Papalexakis, Hua Wei
TL;DR
The paper tackles the challenge that single-dimension uncertainty quantification is insufficient for open-ended LLM outputs. It proposes MD-UQ, a multi-dimensional framework that jointly analyzes semantic similarity and knowledge coherence, represented as a two-sample tensor and disentangled via CP and Tucker decompositions with ensemble scoring. Empirical results on CoQA, HotpotQA, and NQ_Open show MD-UQ generally outperforms state-of-the-art baselines, particularly on harder datasets, and demonstrate robustness to different knowledge extractors, accuracy thresholds, and similarity metrics. The approach offers a more reliable uncertainty signal for high-stakes deployment of LLMs by reducing redundancy and capturing complementary uncertainty cues across dimensions.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks due to large training datasets and powerful transformer architecture. However, the reliability of responses from LLMs remains a question. Uncertainty quantification (UQ) of LLMs is crucial for ensuring their reliability, especially in areas such as healthcare, finance, and decision-making. Existing UQ methods primarily focus on semantic similarity, overlooking the deeper knowledge dimensions embedded in responses. We introduce a multi-dimensional UQ framework that integrates semantic and knowledge-aware similarity analysis. By generating multiple responses and leveraging auxiliary LLMs to extract implicit knowledge, we construct separate similarity matrices and apply tensor decomposition to derive a comprehensive uncertainty representation. This approach disentangles overlapping information from both semantic and knowledge dimensions, capturing both semantic variations and factual consistency, leading to more accurate UQ. Our empirical evaluations demonstrate that our method outperforms existing techniques in identifying uncertain responses, offering a more robust framework for enhancing LLM reliability in high-stakes applications.
