Generation Constraint Scaling Can Mitigate Hallucination

Georgios Kollias; Payel Das; Subhajit Chaudhury

Generation Constraint Scaling Can Mitigate Hallucination

Georgios Kollias, Payel Das, Subhajit Chaudhury

TL;DR

The paper addresses hallucinations in large language models by leveraging explicit memory mechanisms. It introduces a geometry-informed approach in the Larimar memory-augmented decoder, using a scaled readout vector to constrain generation without retraining, and contrasts it with GRACE, a training-based editing method. Empirical results on WikiBio show that scaling the memory readout by $s$ in the range $3$–$4$ substantially reduces hallucinations (e.g., RougeL up to $0.72$, Jaccard similarly improved) and can outperform GRACE, while requiring far less computation. The findings suggest that simple, memory-oriented, training-free interventions can offer strong, practical mitigation of hallucinations in memory-augmented LLMs.

Abstract

Addressing the issue of hallucinations in large language models (LLMs) is a critical challenge. As the cognitive mechanisms of hallucination have been related to memory, here we explore hallucination for LLM that is enabled with explicit memory mechanisms. We empirically demonstrate that by simply scaling the readout vector that constrains generation in a memory-augmented LLM decoder, hallucination mitigation can be achieved in a training-free manner. Our method is geometry-inspired and outperforms a state-of-the-art LLM editing method on the task of generation of Wikipedia-like biography entries both in terms of generation quality and runtime complexity.

Generation Constraint Scaling Can Mitigate Hallucination

TL;DR

in the range

–

substantially reduces hallucinations (e.g., RougeL up to

, Jaccard similarly improved) and can outperform GRACE, while requiring far less computation. The findings suggest that simple, memory-oriented, training-free interventions can offer strong, practical mitigation of hallucinations in memory-augmented LLMs.

Abstract

Paper Structure (6 sections, 6 figures, 1 table)

This paper contains 6 sections, 6 figures, 1 table.

Introduction and Background
Larimar
GRACE
Experiments
Discussion
Example of WikiBio generation

Figures (6)

Figure 1: Larimar pipeline for processing (prompt, input) pairs. Here model refers explicitly to Larimar decoder. Larimar encoder is implicitly involved in converting tokens in write and the query prompt (prompt bracketed by [CLS], [SEP] tokens) into latent vectors.
Figure 2: readout, generate pair.
Figure 3: write, readout pair: enforcing the randomly chosen phrase: "Try to come up with the next wikibio sentence." as the query prompt.
Figure 4: write, readout pair.
Figure 5: Jaccard similarity for different scaling factors $s$ for $\mathbf{z}_{\texttt{readout}}$ in Larimar. Mean Jaccard similarity scores for the ideal case in Larimar ($\mathbf{z}_{\texttt{write}} = \mathbf{z}_{\texttt{readout}}$) and for GRACE are also plotted as horizontal lines for comparison.
...and 1 more figures

Generation Constraint Scaling Can Mitigate Hallucination

TL;DR

Abstract

Generation Constraint Scaling Can Mitigate Hallucination

Authors

TL;DR

Abstract

Table of Contents

Figures (6)