HalluCana: Fixing LLM Hallucination with A Canary Lookahead
Tianyi Li, Erenay Dayanik, Shubhi Tyagi, Andrea Pierleoni
TL;DR
HalluCana introduces a white-box canary lookahead that detects and corrects factuality hallucinations in long-form LLM outputs by leveraging internal hidden-state representations. It combines a pre-hoc CL_0 scorer and an ad-hoc CL_x scorer, applied at critical decoding steps identified via logit-entropy, with a veto mechanism and weighted logit lookahead to steer generation toward faithful content. The faithfulness classifiers are trained on out-of-domain QA data using two supervision signals: QA accuracy and context familiarity derived from pre-training corpora, enabling robust, knowledge-free hallucination detection. On biography generation, HalluCana achieves up to 2.5x improvements in factuality with over 6x less compute than the state-of-the-art baselines, and the results reveal a strong link between LLM internal factuality representations and context familiarity. The findings suggest that context familiarity signals are robust across tasks and can underpin efficient, knowledge-free mitigation of hallucinations in long-form generation.
Abstract
In this paper, we present HalluCana, a canary lookahead to detect and correct factuality hallucinations of Large Language Models (LLMs) in long-form generation. HalluCana detects and intervenes as soon as traces of hallucination emerge, during and even before generation. To support timely detection, we exploit the internal factuality representation in the LLM hidden space, where we investigate various proxies to the LLMs' factuality self-assessment, and discuss its relation to the models' context familiarity from their pre-training. On biography generation, our method improves generation quality by up to 2.5x, while consuming over 6 times less compute.
