Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation
Hieu Nguyen, Zihao He, Shoumik Atul Gandre, Ujjwal Pasupulety, Sharanya Kumari Shivakumar, Kristina Lerman
TL;DR
This work targets hallucinations in large language models by critiquing hard-label supervision and introducing smoothed soft-label knowledge distillation (KD). A teacher model emits probabilistic next-token distributions, and the student is trained with a combined objective that includes a KD term to discourage overconfidence and promote contextual grounding. Across multiple model families (LLaMA-2, LLaMA-3.1, Qwen-2.5) and summarization benchmarks (CNN/Daily Mail, XSUM), KD generally reduces faithfulness hallucination while preserving or improving general NLP performance. The study demonstrates that uncertainty-aware supervision enhances factual grounding, with practical implications for deploying more reliable LLMs in high-stakes contexts, while acknowledging computational costs and scope limitations. Future work may extend KD to pretraining, integrate with retrieval-based grounding, and broaden evaluation to encompass diverse hallucination types and modalities.
Abstract
Large language models (LLMs) often suffer from hallucination, generating factually incorrect or ungrounded content, which limits their reliability in high-stakes applications. A key factor contributing to hallucination is the use of hard labels during training, which enforce deterministic supervision, encourage overconfidence, and disregard the uncertainty inherent in natural language. To address this, we propose mitigating hallucination through knowledge distillation (KD), where a teacher model provides smoothed soft labels to a student model, reducing overconfidence and improving factual grounding. We apply KD during supervised finetuning on instructional data, evaluating its effectiveness across LLMs from different families. Experimental results on summarization benchmarks demonstrate that KD reduces hallucination compared to standard finetuning while preserving performance on general NLP tasks. These findings highlight KD as a promising approach for mitigating hallucination in LLMs and improving model reliability.
