TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness

Danna Zheng; Danyang Liu; Mirella Lapata; Jeff Z. Pan

TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness

Danna Zheng, Danyang Liu, Mirella Lapata, Jeff Z. Pan

TL;DR

TrustScore tackles the problem of evaluating LLM answer trustworthiness in closed-book QA by introducing a reference-free framework based on Behavioral Consistency. The core component, Trust\$_{BC}$, tests whether an LLM consistently selects its own answer among distractors, and can be augmented with external fact-checking through Trust\$_{FC}$ to form Trust\$_{OV}$. Across experiments on MixedQA with multiple LLMs, TrustScore shows strong alignment with human judgments, outperforming existing reference-free metrics and matching reference-based metrics in many settings. The approach offers a practical, modular way to assess and potentially improve LLM trustworthiness in real-world, data-scarce scenarios where external evidence may be unavailable.

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. However, concerns have arisen regarding the trustworthiness of LLMs outputs, particularly in closed-book question-answering tasks, where non-experts may struggle to identify inaccuracies due to the absence of contextual or ground truth information. This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLMs response aligns with its intrinsic knowledge. Additionally, TrustScore can seamlessly integrate with fact-checking methods, which assesses alignment with external knowledge sources. The experimental results show that TrustScore achieves strong correlations with human judgments, surpassing existing reference-free metrics, and achieving results on par with reference-based metrics.

TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness

TL;DR

TrustScore tackles the problem of evaluating LLM answer trustworthiness in closed-book QA by introducing a reference-free framework based on Behavioral Consistency. The core component, Trust\

, tests whether an LLM consistently selects its own answer among distractors, and can be augmented with external fact-checking through Trust\

to form Trust\

. Across experiments on MixedQA with multiple LLMs, TrustScore shows strong alignment with human judgments, outperforming existing reference-free metrics and matching reference-based metrics in many settings. The approach offers a practical, modular way to assess and potentially improve LLM trustworthiness in real-world, data-scarce scenarios where external evidence may be unavailable.

Abstract

Paper Structure (34 sections, 1 figure, 7 tables)

This paper contains 34 sections, 1 figure, 7 tables.

Introduction
Proposed Method
Trust$_{BC}$ Stand-Alone
Integration of Trust$_{BC}$ and Trust$_{FC}$
Experiment Setup
Data Collection
LLMs
Baseline Metrics
Evaluation
Results
Behavioral Consistency Analysis
Integration of Trust$_{BC}$ and Trust$_{FC}$
Robustness to Diverse Answers
Related Work
QA Evaluation Metrics
...and 19 more sections

Figures (1)

Figure 1: An illustration of Behavioral Consistency Evaluator (Trust$_{BC}$). LLM responses pass through Trust$_{BC}$ to determine if the model selects its original answer consistently across multiple iterations.

TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness

TL;DR

Abstract

TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness

Authors

TL;DR

Abstract

Table of Contents

Figures (1)