API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access
Jiayuan Su, Jing Luo, Hongwei Wang, Lu Cheng
TL;DR
This work tackles uncertainty quantification for API-only LLMs that do not expose logits. It introduces LofreeCP, a logit-free conformal predictor that combines a frequency-based ranking proxy with two fine-grained uncertainty notions—normalized entropy (NE) and semantic similarity (SS)—to form nonconformity scores and calibrated prediction sets under CP. The authors show that frequency-only probability estimation is computationally infeasible and establish a formal coverage guarantee for LofreeCP. Empirically, LofreeCP achieves smaller average prediction set sizes (APSS) and competitive or superior coverage compared with logit-based CP baselines on TriviaQA, WebQuestions, and MMLU. This approach enables practical, calibrated uncertainty estimation for API-based LLMs and broadens CP applicability beyond access to internal model logits.
Abstract
This study aims to address the pervasive challenge of quantifying uncertainty in large language models (LLMs) without logit-access. Conformal Prediction (CP), known for its model-agnostic and distribution-free features, is a desired approach for various LLMs and data distributions. However, existing CP methods for LLMs typically assume access to the logits, which are unavailable for some API-only LLMs. In addition, logits are known to be miscalibrated, potentially leading to degraded CP performance. To tackle these challenges, we introduce a novel CP method that (1) is tailored for API-only LLMs without logit-access; (2) minimizes the size of prediction sets; and (3) ensures a statistical guarantee of the user-defined coverage. The core idea of this approach is to formulate nonconformity measures using both coarse-grained (i.e., sample frequency) and fine-grained uncertainty notions (e.g., semantic similarity). Experimental results on both close-ended and open-ended Question Answering tasks show our approach can mostly outperform the logit-based CP baselines.
