Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Ante Wang, Linfeng Song, Baolin Peng, Ye Tian, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu
TL;DR
This paper tackles fact-conflicting hallucinations in large language models by introducing self-endorsement, a prompting-based inference-time framework that performs fine-grained fact-level cross-response verification across multiple samples. By decomposing each candidate into facts and computing endorsement scores via cross-candidate checks (and optional context pruning), it either selects the best candidate or regenerates a final answer conditioned on high-quality facts. Empirical results on Biographies, TriviaQA, and GSM8K show notable factuality gains for open-source and smaller LLMs, with endorsement scores correlating positively with factuality and improvements persisting under various hyperparameters. The approach offers a practical, scalable solution for reducing hallucinations in real-world settings and has potential for broader application beyond the tested domains.
Abstract
This work studies improving large language model (LLM) generations at inference time by mitigating fact-conflicting hallucinations. Particularly, we propose a self-endorsement framework that leverages the fine-grained fact-level comparisons across multiple sampled responses. Compared with prior ensemble methods (Wang et al., 2022;Chen et al., 2023)) that perform response-level selection, our approach can better alleviate hallucinations, especially for longform generation tasks. Our approach can broadly benefit smaller and open-source LLMs as it mainly conducts simple content-based comparisons. Experiments on Biographies show that our method can effectively improve the factuality of generations with simple and intuitive prompts across different scales of LLMs. Besides, comprehensive analyses on TriviaQA and GSM8K demonstrate the potential of self-endorsement for broader application.
