Table of Contents
Fetching ...

I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

Roi Cohen, Konstantin Dobler, Eden Biran, Gerard de Melo

TL;DR

The paper tackles hallucinations in large language models by enabling explicit uncertainty signaling through a new [IDK] token. It introduces IDK-tuning, a continued pretraining objective that shifts probability mass away from incorrect tokens toward [IDK] based on an adaptive Uncertainty Factor $\lambda$ and an upper bound $\Pi$, without requiring labeled data. Across multiple architectures (e.g., Mistral-7B-v0.1 and bert-base-cased), IDK-tuning yields substantial gains in factual precision with only modest reductions in recall, and scalability analyses show benefits increasing with model size. While computationally intensive and potentially affecting long-form generation, the method offers a practical mechanism for uncertainty-aware generation that can be combined with other safety checks and downstream finetuning.

Abstract

Large Language Models are known to capture real-world knowledge, allowing them to excel in many downstream tasks. Despite recent advances, these models are still prone to what are commonly known as hallucinations, causing them to emit unwanted and factually incorrect text. In this work, we propose a novel calibration method that can be used to combat hallucinations. We add a special [IDK] ("I don't know") token to the model's vocabulary and introduce an objective function that shifts probability mass to the [IDK] token for incorrect predictions. This approach allows the model to express uncertainty in its output explicitly. We evaluate our proposed method across multiple model architectures and factual downstream tasks. We find that models trained with our method are able to express uncertainty in places where they would previously make mistakes while suffering only a small loss of encoded knowledge. We further perform extensive ablation studies of multiple variations of our approach and provide a detailed analysis of the precision-recall tradeoff of our method.

I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

TL;DR

The paper tackles hallucinations in large language models by enabling explicit uncertainty signaling through a new [IDK] token. It introduces IDK-tuning, a continued pretraining objective that shifts probability mass away from incorrect tokens toward [IDK] based on an adaptive Uncertainty Factor and an upper bound , without requiring labeled data. Across multiple architectures (e.g., Mistral-7B-v0.1 and bert-base-cased), IDK-tuning yields substantial gains in factual precision with only modest reductions in recall, and scalability analyses show benefits increasing with model size. While computationally intensive and potentially affecting long-form generation, the method offers a practical mechanism for uncertainty-aware generation that can be combined with other safety checks and downstream finetuning.

Abstract

Large Language Models are known to capture real-world knowledge, allowing them to excel in many downstream tasks. Despite recent advances, these models are still prone to what are commonly known as hallucinations, causing them to emit unwanted and factually incorrect text. In this work, we propose a novel calibration method that can be used to combat hallucinations. We add a special [IDK] ("I don't know") token to the model's vocabulary and introduce an objective function that shifts probability mass to the [IDK] token for incorrect predictions. This approach allows the model to express uncertainty in its output explicitly. We evaluate our proposed method across multiple model architectures and factual downstream tasks. We find that models trained with our method are able to express uncertainty in places where they would previously make mistakes while suffering only a small loss of encoded knowledge. We further perform extensive ablation studies of multiple variations of our approach and provide a detailed analysis of the precision-recall tradeoff of our method.

Paper Structure

This paper contains 36 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration of our proposed IDK objective. During continual pretraining, we shift some probability mass of wrong predictions towards a special [IDK] token. The amount of shifted probability mass depends on the uncertainty in the model's prediction. We detail our method in \ref{['sec:method']}.
  • Figure 2: Average performance on closed-book factual sentence completion benchmarks of IDK-tuned models in terms of their parameter count. 70m to 2.8B are pythia-70m -- 2.8B, while 7.0B is Mistral-7B-v0.1.
  • Figure 3: Ablation study of different values for the $\Pi$ factor that controls the upper bound of probability mass put on [IDK] in the target.
  • Figure 4: Tradeoff between IDK recall and IDK error rate for different parameter combinations. We annotate each data point with its corresponding $\Pi$ value.