Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

Taehun Cha; Donghun Lee

Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

Taehun Cha, Donghun Lee

TL;DR

This work shows the pre-trained language models return distinguishable generation probability and uncertainty distribution to unfaithfully hallucinated texts, regardless of their size and structure, and showcases a hallucination-reducing training algorithm that outperforms other baselines by achieving higher faithfulness metrics while maintaining sound general text quality measures.

Abstract

In this work, we show the pre-trained language models return distinguishable generation probability and uncertainty distribution to unfaithfully hallucinated texts, regardless of their size and structure. By examining 24 models on 6 data sets, we find out that 88-98% of cases return statistically significantly distinguishable generation probability and uncertainty distributions. Using this general phenomenon, we showcase a hallucination-reducing training algorithm. Our algorithm outperforms other baselines by achieving higher faithfulness metrics while maintaining sound general text quality measures.

Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

TL;DR

Abstract

Paper Structure (17 sections, 6 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 6 figures, 4 tables, 1 algorithm.

Introduction
Related Works
Suggested Metrics
Distribution Distinguishability
Experimental Setup
Does PLM return Distinguishable Distributions to Unfaithful Texts?
Model Size Effect
Fine-tuning Effect
Hallucination Reduction with Weighted Training
Conclusion
Visualization of KS Statistic and Wasserstein Distance
Basic Statistics of Data
Hallucination Data
Weighted Training Data
Log Token Probability Distribution for Hallucinated and Entailed Data Sets
...and 2 more sections

Figures (6)

Figure 1: Empirical Entropy distribution and mean of $D_{Hallucinated}$ and $D_{Entailed}$ for each model and data set. We first compute Entropy for each data point, then separate the points according to the hallucination label. $x$-axis represents Entropy and $y$-axis represents the relative frequency. We plot the result of the smallest models for each model type.
Figure 2: Visualization of the size effect. We divide all the Wasserstein distances with the distances from the smallest model to visualize the relative change as the size grows.
Figure 3: Fine-tuning effect for WOW data set. We divide all the Wasserstein distances with the distances from the pre-trained model to visualize the relative change as training proceeds.
Figure 4: Visualization of the Kolmogorov–Smirnov statistic and Wasserstein distance. Red and blue histograms are separate cdfs to compare and the yellow arrow and the area represent each statistic.
Figure 5: Fine-tuning effect for CMU data set. We divide all the Wasserstein distances with the statistics from the pre-trained model to visualize the relative change as training proceeds.
...and 1 more figures

Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

TL;DR

Abstract

Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

Authors

TL;DR

Abstract

Table of Contents

Figures (6)