Table of Contents
Fetching ...

PCS: Perceived Confidence Scoring of Black Box LLMs with Metamorphic Relations

Sina Salimian, Gias Uddin, Shaina Raza, Henry Leung

TL;DR

PCS presents a black-box, metamorphic-relations–driven framework for estimating confidence in zero-shot text classifications by measuring label consistency across semantically equivalent input variants. It learns weights for each metamorphic relation and for each LLM via linear regression, enabling robust, interpretable confidence scores that can be aggregated across multiple LLMs. Across three datasets and three instruction-tuned LLMs, PCS yields significant AUROC gains over single-model and ensemble baselines, and outperforms SPUQ and top-k confidence methods in most cases. The approach supports both single-LLM and multi-LLM settings and is implemented in the PCS Toolbox (pcs-annotator) for reproducibility and broad adoption. Overall, PCS enhances reliability and trust in black-box LLM-assisted classification by leveraging controlled linguistic perturbations and cross-model insights.

Abstract

Zero-shot LLMs are now also used for textual classification tasks, e.g., sentiment and bias detection in a sentence or article. However, their performance can be suboptimal in such data annotation tasks. We introduce a novel technique that evaluates an LLM's confidence for classifying a textual input by leveraging Metamorphic Relations (MRs). The MRs generate semantically equivalent yet textually divergent versions of the input. Following the principles of Metamorphic Testing (MT), the mutated versions are expected to have annotation labels similar to the input. By analyzing the consistency of an LLM's responses across these variations, we compute a perceived confidence score (PCS) based on the frequency of the predicted labels. PCS can be used for both single and multiple LLM settings (e.g., when multiple LLMs are vetted in a majority-voting setup). Empirical evaluation shows that our PCS-based approach improves the performance of zero-shot LLMs by 9.3% in textual classification tasks. When multiple LLMs are used in a majority-voting setup, we obtain a performance boost of 5.8% with PCS.

PCS: Perceived Confidence Scoring of Black Box LLMs with Metamorphic Relations

TL;DR

PCS presents a black-box, metamorphic-relations–driven framework for estimating confidence in zero-shot text classifications by measuring label consistency across semantically equivalent input variants. It learns weights for each metamorphic relation and for each LLM via linear regression, enabling robust, interpretable confidence scores that can be aggregated across multiple LLMs. Across three datasets and three instruction-tuned LLMs, PCS yields significant AUROC gains over single-model and ensemble baselines, and outperforms SPUQ and top-k confidence methods in most cases. The approach supports both single-LLM and multi-LLM settings and is implemented in the PCS Toolbox (pcs-annotator) for reproducibility and broad adoption. Overall, PCS enhances reliability and trust in black-box LLM-assisted classification by leveraging controlled linguistic perturbations and cross-model insights.

Abstract

Zero-shot LLMs are now also used for textual classification tasks, e.g., sentiment and bias detection in a sentence or article. However, their performance can be suboptimal in such data annotation tasks. We introduce a novel technique that evaluates an LLM's confidence for classifying a textual input by leveraging Metamorphic Relations (MRs). The MRs generate semantically equivalent yet textually divergent versions of the input. Following the principles of Metamorphic Testing (MT), the mutated versions are expected to have annotation labels similar to the input. By analyzing the consistency of an LLM's responses across these variations, we compute a perceived confidence score (PCS) based on the frequency of the predicted labels. PCS can be used for both single and multiple LLM settings (e.g., when multiple LLMs are vetted in a majority-voting setup). Empirical evaluation shows that our PCS-based approach improves the performance of zero-shot LLMs by 9.3% in textual classification tasks. When multiple LLMs are used in a majority-voting setup, we obtain a performance boost of 5.8% with PCS.

Paper Structure

This paper contains 27 sections, 5 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Overview of the classification pipeline using Perceived Confidence Score (PCS) Annotator.
  • Figure 2: Effect of optimizing dataset size on AUROC score using the PCS
  • Figure 3: Overview of the pcs-annotator architecture. The system integrates four core components: LLM, Annotator, TextMutator, and PCS modules, orchestrating the data flow from text mutation to confidence computation.