Table of Contents
Fetching ...

Self-Evaluation of Large Language Model based on Glass-box Features

Hui Huang, Yingqi Qu, Jing Liu, Muyun Yang, Bing Xu, Tiejun Zhao, Wenpeng Lu

TL;DR

This study investigates various glass-box feature groups and discovered that the softmax distribution serves as a reliable quality indicator for self-evaluation of LLMs using glass-box features.

Abstract

The proliferation of open-source Large Language Models (LLMs) underscores the pressing need for evaluation methods. Existing works primarily rely on external evaluators, focusing on training and prompting strategies. However, a crucial aspect, model-aware glass-box features, is overlooked. In this study, we explore the utility of glass-box features under the scenario of self-evaluation, namely applying an LLM to evaluate its own output. We investigate various glass-box feature groups and discovered that the softmax distribution serves as a reliable quality indicator for self-evaluation. Experimental results on public benchmarks validate the feasibility of self-evaluation of LLMs using glass-box features.

Self-Evaluation of Large Language Model based on Glass-box Features

TL;DR

This study investigates various glass-box feature groups and discovered that the softmax distribution serves as a reliable quality indicator for self-evaluation of LLMs using glass-box features.

Abstract

The proliferation of open-source Large Language Models (LLMs) underscores the pressing need for evaluation methods. Existing works primarily rely on external evaluators, focusing on training and prompting strategies. However, a crucial aspect, model-aware glass-box features, is overlooked. In this study, we explore the utility of glass-box features under the scenario of self-evaluation, namely applying an LLM to evaluate its own output. We investigate various glass-box feature groups and discovered that the softmax distribution serves as a reliable quality indicator for self-evaluation. Experimental results on public benchmarks validate the feasibility of self-evaluation of LLMs using glass-box features.
Paper Structure (17 sections, 9 equations, 8 figures, 2 tables)

This paper contains 17 sections, 9 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Prompt pool for prompt-based ensemble uncertainty estimation.
  • Figure 2: Prompt format with in-context illustration. The shaded part is the illustration with reference.
  • Figure 3: Prompt template for GPT4 and GPT-3.5-Turbo applied for single-turn evaluation.
  • Figure 4: Prompt template for GPT4 and GPT-3.5-Turbo applied for multi-turn evaluation.
  • Figure 5: Prompt template for Auto-J applied for single-turn evaluation.
  • ...and 3 more figures