In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He
TL;DR
This work investigates hallucination in LLMs through the lens of inner representations, identifying in-context sharpness as a reliable signal for factuality. It introduces an entropy-based contextual sharpness metric and a constrained decoding method, Activation Decoding, which biases next-token predictions toward tokens with sharper in-context activations. Empirical results across TruthfulQA, TriviaQA, HotpotQA, and Natural Questions show that contextual entropy distinguishes true from false outputs (AUROC > 0.75) and that Activation Decoding yields consistent factuality gains across model sizes, with practical inference-time optimizations. The approach highlights a practical pathway for mitigating model-related hallucinations and enhances understanding of how hidden states encode factual knowledge, while acknowledging inherent trade-offs and scope limitations.
Abstract
Large language models (LLMs) frequently hallucinate and produce factual errors, yet our understanding of why they make these errors remains limited. In this study, we delve into the underlying mechanisms of LLM hallucinations from the perspective of inner representations, and discover a salient pattern associated with hallucinations: correct generations tend to have sharper context activations in the hidden states of the in-context tokens, compared to the incorrect ones. Leveraging this insight, we propose an entropy-based metric to quantify the ``sharpness'' among the in-context hidden states and incorporate it into the decoding process to formulate a constrained decoding approach. Experiments on various knowledge-seeking and hallucination benchmarks demonstrate our approach's consistent effectiveness, for example, achieving up to an 8.6 point improvement on TruthfulQA. We believe this study can improve our understanding of hallucinations and serve as a practical solution for hallucination mitigation.
