Table of Contents
Fetching ...

Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding

Xin Liu, Farima Fatahi Bayat, Lu Wang

TL;DR

An activation-based calibration method, ActCab, which trains a linear layer on top of the LM’s last-layer activations that can better capture the representations of knowledge and CoDec, a confidence-guided decoding strategy to elicit truthful answers with high confidence from LMs are proposed.

Abstract

Calibrating language models (LMs) aligns their generation confidence with the actual likelihood of answer correctness, which can inform users about LMs' reliability and mitigate hallucinated content. However, prior calibration methods, such as self-consistency-based and logit-based approaches, are either limited in inference-time efficiency or fall short of providing informative signals. Moreover, simply filtering out low-confidence responses reduces the LM's helpfulness when the answers are correct. Therefore, effectively using calibration techniques to enhance an LM's factuality remains an unsolved challenge. In this paper, we first propose an activation-based calibration method, ActCab, which trains a linear layer on top of the LM's last-layer activations that can better capture the representations of knowledge. Built on top of ActCab, we further propose CoDec, a confidence-guided decoding strategy to elicit truthful answers with high confidence from LMs. By evaluating on five popular QA benchmarks, ActCab achieves superior calibration performance than all competitive baselines, e.g., by reducing the average expected calibration error (ECE) score by up to 39%. Further experiments on CoDec show consistent improvements in several LMs' factuality on challenging QA datasets, such as TruthfulQA, highlighting the value of confidence signals in enhancing factuality.

Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding

TL;DR

An activation-based calibration method, ActCab, which trains a linear layer on top of the LM’s last-layer activations that can better capture the representations of knowledge and CoDec, a confidence-guided decoding strategy to elicit truthful answers with high confidence from LMs are proposed.

Abstract

Calibrating language models (LMs) aligns their generation confidence with the actual likelihood of answer correctness, which can inform users about LMs' reliability and mitigate hallucinated content. However, prior calibration methods, such as self-consistency-based and logit-based approaches, are either limited in inference-time efficiency or fall short of providing informative signals. Moreover, simply filtering out low-confidence responses reduces the LM's helpfulness when the answers are correct. Therefore, effectively using calibration techniques to enhance an LM's factuality remains an unsolved challenge. In this paper, we first propose an activation-based calibration method, ActCab, which trains a linear layer on top of the LM's last-layer activations that can better capture the representations of knowledge. Built on top of ActCab, we further propose CoDec, a confidence-guided decoding strategy to elicit truthful answers with high confidence from LMs. By evaluating on five popular QA benchmarks, ActCab achieves superior calibration performance than all competitive baselines, e.g., by reducing the average expected calibration error (ECE) score by up to 39%. Further experiments on CoDec show consistent improvements in several LMs' factuality on challenging QA datasets, such as TruthfulQA, highlighting the value of confidence signals in enhancing factuality.
Paper Structure (22 sections, 4 equations, 2 figures, 6 tables)

This paper contains 22 sections, 4 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: The process of constructing soft training labels for ECE loss. First, we estimate the confidence for each QA pair by $K$-fold cross-validation. Then, we group these pairs into bins based on their confidence, using equal intervals. Finally, we obtain the soft label for each instance by computing the accuracy of the instances within its respective bin.
  • Figure 2: The process of CoDec decoding. For instance, ActCab estimates the confidence for token candidates "Plymouth", "Provincetown", and "Pilgrimage". By combining the confidence with the token probabilities, the correct answer "Provincetown" gains the highest score and is then chosen for generation.