Table of Contents
Fetching ...

HalluCana: Fixing LLM Hallucination with A Canary Lookahead

Tianyi Li, Erenay Dayanik, Shubhi Tyagi, Andrea Pierleoni

TL;DR

HalluCana introduces a white-box canary lookahead that detects and corrects factuality hallucinations in long-form LLM outputs by leveraging internal hidden-state representations. It combines a pre-hoc CL_0 scorer and an ad-hoc CL_x scorer, applied at critical decoding steps identified via logit-entropy, with a veto mechanism and weighted logit lookahead to steer generation toward faithful content. The faithfulness classifiers are trained on out-of-domain QA data using two supervision signals: QA accuracy and context familiarity derived from pre-training corpora, enabling robust, knowledge-free hallucination detection. On biography generation, HalluCana achieves up to 2.5x improvements in factuality with over 6x less compute than the state-of-the-art baselines, and the results reveal a strong link between LLM internal factuality representations and context familiarity. The findings suggest that context familiarity signals are robust across tasks and can underpin efficient, knowledge-free mitigation of hallucinations in long-form generation.

Abstract

In this paper, we present HalluCana, a canary lookahead to detect and correct factuality hallucinations of Large Language Models (LLMs) in long-form generation. HalluCana detects and intervenes as soon as traces of hallucination emerge, during and even before generation. To support timely detection, we exploit the internal factuality representation in the LLM hidden space, where we investigate various proxies to the LLMs' factuality self-assessment, and discuss its relation to the models' context familiarity from their pre-training. On biography generation, our method improves generation quality by up to 2.5x, while consuming over 6 times less compute.

HalluCana: Fixing LLM Hallucination with A Canary Lookahead

TL;DR

HalluCana introduces a white-box canary lookahead that detects and corrects factuality hallucinations in long-form LLM outputs by leveraging internal hidden-state representations. It combines a pre-hoc CL_0 scorer and an ad-hoc CL_x scorer, applied at critical decoding steps identified via logit-entropy, with a veto mechanism and weighted logit lookahead to steer generation toward faithful content. The faithfulness classifiers are trained on out-of-domain QA data using two supervision signals: QA accuracy and context familiarity derived from pre-training corpora, enabling robust, knowledge-free hallucination detection. On biography generation, HalluCana achieves up to 2.5x improvements in factuality with over 6x less compute than the state-of-the-art baselines, and the results reveal a strong link between LLM internal factuality representations and context familiarity. The findings suggest that context familiarity signals are robust across tasks and can underpin efficient, knowledge-free mitigation of hallucinations in long-form generation.

Abstract

In this paper, we present HalluCana, a canary lookahead to detect and correct factuality hallucinations of Large Language Models (LLMs) in long-form generation. HalluCana detects and intervenes as soon as traces of hallucination emerge, during and even before generation. To support timely detection, we exploit the internal factuality representation in the LLM hidden space, where we investigate various proxies to the LLMs' factuality self-assessment, and discuss its relation to the models' context familiarity from their pre-training. On biography generation, our method improves generation quality by up to 2.5x, while consuming over 6 times less compute.

Paper Structure

This paper contains 36 sections, 11 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Diagram illustration of HalluCana in action.
  • Figure 2: The FActScore-Rejection Curves of various classification approaches, when applied on greedy-decoded generations of Falcon-7b-instruct, over factscore dev set.