Table of Contents
Fetching ...

SH2: Self-Highlighted Hesitation Helps You Decode More Truthfully

Jushi Kai, Tianhang Zhang, Hai Hu, Zhouhan Lin

TL;DR

This work proposes an inference-time method, Self-Highlighted Hesitation (SH2), to help LLMs decode more truthfully by selecting the tokens with the lowest probabilities and concatenating them to the original context, thus forcing the model to repeatedly read and hesitate on these tokens before generation.

Abstract

Large language models (LLMs) demonstrate great performance in text generation. However, LLMs are still suffering from hallucinations. In this work, we propose an inference-time method, Self-Highlighted Hesitation (SH2), to help LLMs decode more truthfully. SH2 is based on a simple fact rooted in information theory that for an LLM, the tokens predicted with lower probabilities are prone to be more informative than others. Our analysis shows that the tokens assigned with lower probabilities by an LLM are more likely to be closely related to factual information, such as nouns, proper nouns, and adjectives. Therefore, we propose to ''highlight'' the factual information by selecting the tokens with the lowest probabilities and concatenating them to the original context, thus forcing the model to repeatedly read and hesitate on these tokens before generation. During decoding, we also adopt contrastive decoding to emphasize the difference in the output probabilities brought by the hesitation. Experimental results demonstrate that our SH2, requiring no additional data or models, can effectively help LLMs elicit factual knowledge and distinguish hallucinated contexts. Significant and consistent improvements are achieved by SH2 for LLaMA-7b, LLaMA2-7b and Mistral-7b on multiple hallucination tasks.

SH2: Self-Highlighted Hesitation Helps You Decode More Truthfully

TL;DR

This work proposes an inference-time method, Self-Highlighted Hesitation (SH2), to help LLMs decode more truthfully by selecting the tokens with the lowest probabilities and concatenating them to the original context, thus forcing the model to repeatedly read and hesitate on these tokens before generation.

Abstract

Large language models (LLMs) demonstrate great performance in text generation. However, LLMs are still suffering from hallucinations. In this work, we propose an inference-time method, Self-Highlighted Hesitation (SH2), to help LLMs decode more truthfully. SH2 is based on a simple fact rooted in information theory that for an LLM, the tokens predicted with lower probabilities are prone to be more informative than others. Our analysis shows that the tokens assigned with lower probabilities by an LLM are more likely to be closely related to factual information, such as nouns, proper nouns, and adjectives. Therefore, we propose to ''highlight'' the factual information by selecting the tokens with the lowest probabilities and concatenating them to the original context, thus forcing the model to repeatedly read and hesitate on these tokens before generation. During decoding, we also adopt contrastive decoding to emphasize the difference in the output probabilities brought by the hesitation. Experimental results demonstrate that our SH2, requiring no additional data or models, can effectively help LLMs elicit factual knowledge and distinguish hallucinated contexts. Significant and consistent improvements are achieved by SH2 for LLaMA-7b, LLaMA2-7b and Mistral-7b on multiple hallucination tasks.
Paper Structure (38 sections, 6 equations, 12 figures, 7 tables)

This paper contains 38 sections, 6 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: The pipeline to construct and leverage our Self-Highlighted Hesitation. The original input X consists of the instruction, document and summary. The hesitation of key tokens is appended to the document in the hesitated input X'.
  • Figure 2: The heatmap to show the normalized top-$\eta$ recall for the top 20 most frequent POS tags. The light color and the high value indicates that these POS tags occupy high proportions in the hardest part. 1000 documents are sampled from the summarization track of HaluEval HaluEval, which is a dataset collected from CNN/Daily Mail cnndm. We extract the hardest words that contain key tokens from these documents with the proportion of $\eta$ ranging from 1% to 10% by LLaMA2-7b llama2.
  • Figure 3: Different choices of highlighted tokens by LLaMA2-7b.
  • Figure 4: Precision, recall and F1 scores (%) for different highlighted tokens with the effective sampling proportion $\eta'$ ranging from 1% to 8%. The errorbar denotes standard deviations. The scores of the vanilla LLaMA2-7b are obtained by evaluating on the whole dataset of HaluEval-Sum.
  • Figure 5: Effect of contrastive decoding on hesitations. The dashed line in gray represents the MC2 score of the vanilla LLaMA-7b. Dashed lines in other colors represent the MC2 scores for the standard contrastive decoding with different numbers of highlighted tokens.
  • ...and 7 more figures