Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data
Katja Filippova
TL;DR
The paper tackles data-induced hallucinations in neural text generation by introducing a low-overhead, architecture-agnostic Hallucination knob that prefixes inputs with a hallucination level to control output faithfulness. It defines two scalable hallucination-detection schemes, $hal_{WO}$ and $hal_{LM}$, to label training examples with noise levels, which are then used to condition generation. In experiments on WikiBio, controlled models achieve substantially higher faithfulness with preserved fluency and comparable coverage, and LM-based detection often yields better human-evaluated quality than overlap-based detection. The work demonstrates that faithful, fluent, and comprehensive outputs can be achieved without modifying model architectures, suggesting practical applicability to noisy data regimes and broader controlled-generation tasks.
Abstract
Neural text generation (data- or text-to-text) demonstrates remarkable performance when training data is abundant which for many applications is not the case. To collect a large corpus of parallel data, heuristic rules are often used but they inevitably let noise into the data, such as phrases in the output which cannot be explained by the input. Consequently, models pick up on the noise and may hallucinate--generate fluent but unsupported text. Our contribution is a simple but powerful technique to treat such hallucinations as a controllable aspect of the generated text, without dismissing any input and without modifying the model architecture. On the WikiBio corpus (Lebret et al., 2016), a particularly noisy dataset, we demonstrate the efficacy of the technique both in an automatic and in a human evaluation.
