Table of Contents
Fetching ...

Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation

Hieu Nguyen, Zihao He, Shoumik Atul Gandre, Ujjwal Pasupulety, Sharanya Kumari Shivakumar, Kristina Lerman

TL;DR

This work targets hallucinations in large language models by critiquing hard-label supervision and introducing smoothed soft-label knowledge distillation (KD). A teacher model emits probabilistic next-token distributions, and the student is trained with a combined objective that includes a KD term to discourage overconfidence and promote contextual grounding. Across multiple model families (LLaMA-2, LLaMA-3.1, Qwen-2.5) and summarization benchmarks (CNN/Daily Mail, XSUM), KD generally reduces faithfulness hallucination while preserving or improving general NLP performance. The study demonstrates that uncertainty-aware supervision enhances factual grounding, with practical implications for deploying more reliable LLMs in high-stakes contexts, while acknowledging computational costs and scope limitations. Future work may extend KD to pretraining, integrate with retrieval-based grounding, and broaden evaluation to encompass diverse hallucination types and modalities.

Abstract

Large language models (LLMs) often suffer from hallucination, generating factually incorrect or ungrounded content, which limits their reliability in high-stakes applications. A key factor contributing to hallucination is the use of hard labels during training, which enforce deterministic supervision, encourage overconfidence, and disregard the uncertainty inherent in natural language. To address this, we propose mitigating hallucination through knowledge distillation (KD), where a teacher model provides smoothed soft labels to a student model, reducing overconfidence and improving factual grounding. We apply KD during supervised finetuning on instructional data, evaluating its effectiveness across LLMs from different families. Experimental results on summarization benchmarks demonstrate that KD reduces hallucination compared to standard finetuning while preserving performance on general NLP tasks. These findings highlight KD as a promising approach for mitigating hallucination in LLMs and improving model reliability.

Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation

TL;DR

This work targets hallucinations in large language models by critiquing hard-label supervision and introducing smoothed soft-label knowledge distillation (KD). A teacher model emits probabilistic next-token distributions, and the student is trained with a combined objective that includes a KD term to discourage overconfidence and promote contextual grounding. Across multiple model families (LLaMA-2, LLaMA-3.1, Qwen-2.5) and summarization benchmarks (CNN/Daily Mail, XSUM), KD generally reduces faithfulness hallucination while preserving or improving general NLP performance. The study demonstrates that uncertainty-aware supervision enhances factual grounding, with practical implications for deploying more reliable LLMs in high-stakes contexts, while acknowledging computational costs and scope limitations. Future work may extend KD to pretraining, integrate with retrieval-based grounding, and broaden evaluation to encompass diverse hallucination types and modalities.

Abstract

Large language models (LLMs) often suffer from hallucination, generating factually incorrect or ungrounded content, which limits their reliability in high-stakes applications. A key factor contributing to hallucination is the use of hard labels during training, which enforce deterministic supervision, encourage overconfidence, and disregard the uncertainty inherent in natural language. To address this, we propose mitigating hallucination through knowledge distillation (KD), where a teacher model provides smoothed soft labels to a student model, reducing overconfidence and improving factual grounding. We apply KD during supervised finetuning on instructional data, evaluating its effectiveness across LLMs from different families. Experimental results on summarization benchmarks demonstrate that KD reduces hallucination compared to standard finetuning while preserving performance on general NLP tasks. These findings highlight KD as a promising approach for mitigating hallucination in LLMs and improving model reliability.

Paper Structure

This paper contains 28 sections, 3 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Comparison of cross-entropy optimization with hard labels vs. smoothed soft labels. The figure illustrates how training with (a) hard labels differs from training with (b) contextually smoothed labels in an autoregressive language model. In (a), the hard label for the word “physics” is represented as a one-hot encoded (OHE) vector, assigning full probability (1.0) to a single token while forcing all alternative predictions (e.g., “Maths”, “Assignments”, “Arts”) to have zero probability. This OHE representation introduces zero entropy, disregarding the inherent uncertainty in natural language, and leading the model to overconfidently discard reasonable alternatives. This forced certainty can cause the model to develop spurious assumptions and hallucinate incorrect outputs when faced with ambiguous contexts.
  • Figure 2: Comparison of summaries generated by the SFT and KD models. The SFT summary introduces hallucinated content (highlighted in red) that is factually incorrect or not present in the input context. In contrast, the KD summary remains faithful (highlighted in blue) to the provided input, accurately conveying key details without introducing unrelated or incorrect facts. This case study illustrates the effectiveness of knowledge distillation in mitigating hallucination and improving factual consistency.
  • Figure 3: Kernel density estimation of confidence levels of incorrect answers in vanilla and finetuned (a) LLama-2-7B, (b) Mistral-7B, (c) Falcon-7B, (d) Pythia-6.9B, when evaluated on the validation set of CommonsenseQA. The confidence level is measured as the NLL.