Table of Contents
Fetching ...

Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

Taehun Cha, Donghun Lee

TL;DR

This work shows the pre-trained language models return distinguishable generation probability and uncertainty distribution to unfaithfully hallucinated texts, regardless of their size and structure, and showcases a hallucination-reducing training algorithm that outperforms other baselines by achieving higher faithfulness metrics while maintaining sound general text quality measures.

Abstract

In this work, we show the pre-trained language models return distinguishable generation probability and uncertainty distribution to unfaithfully hallucinated texts, regardless of their size and structure. By examining 24 models on 6 data sets, we find out that 88-98% of cases return statistically significantly distinguishable generation probability and uncertainty distributions. Using this general phenomenon, we showcase a hallucination-reducing training algorithm. Our algorithm outperforms other baselines by achieving higher faithfulness metrics while maintaining sound general text quality measures.

Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

TL;DR

This work shows the pre-trained language models return distinguishable generation probability and uncertainty distribution to unfaithfully hallucinated texts, regardless of their size and structure, and showcases a hallucination-reducing training algorithm that outperforms other baselines by achieving higher faithfulness metrics while maintaining sound general text quality measures.

Abstract

In this work, we show the pre-trained language models return distinguishable generation probability and uncertainty distribution to unfaithfully hallucinated texts, regardless of their size and structure. By examining 24 models on 6 data sets, we find out that 88-98% of cases return statistically significantly distinguishable generation probability and uncertainty distributions. Using this general phenomenon, we showcase a hallucination-reducing training algorithm. Our algorithm outperforms other baselines by achieving higher faithfulness metrics while maintaining sound general text quality measures.
Paper Structure (17 sections, 6 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Empirical Entropy distribution and mean of $D_{Hallucinated}$ and $D_{Entailed}$ for each model and data set. We first compute Entropy for each data point, then separate the points according to the hallucination label. $x$-axis represents Entropy and $y$-axis represents the relative frequency. We plot the result of the smallest models for each model type.
  • Figure 2: Visualization of the size effect. We divide all the Wasserstein distances with the distances from the smallest model to visualize the relative change as the size grows.
  • Figure 3: Fine-tuning effect for WOW data set. We divide all the Wasserstein distances with the distances from the pre-trained model to visualize the relative change as training proceeds.
  • Figure 4: Visualization of the Kolmogorov–Smirnov statistic and Wasserstein distance. Red and blue histograms are separate cdfs to compare and the yellow arrow and the area represent each statistic.
  • Figure 5: Fine-tuning effect for CMU data set. We divide all the Wasserstein distances with the statistics from the pre-trained model to visualize the relative change as training proceeds.
  • ...and 1 more figures