Table of Contents
Fetching ...

Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy

Benedict Aaron Tjandra, Muhammed Razzak, Jannik Kossen, Kunal Handa, Yarin Gal

TL;DR

This work proposes fine-tuning using semantic entropy, an uncertainty measure derived from introspection into the model which does not require external labels, and demonstrates that this approach matches or outperforms models fine-tuned using prior work and achieves strong performance for both short and long-form generations on a range of datasets.

Abstract

Large Language Models (LLMs) are known to hallucinate, whereby they generate plausible but inaccurate text. This phenomenon poses significant risks in critical applications, such as medicine or law, necessitating robust hallucination mitigation strategies. While recent works have proposed fine-tuning methods to teach LLMs to abstain from answering questions beyond their knowledge or capabilities, these methods rely on the existence of ground-truth labels or are limited to short-form responses. To address these limitations, we propose fine-tuning using semantic entropy, an uncertainty measure derived from introspection into the model which does not require external labels. We demonstrate that our approach matches or outperforms models fine-tuned using prior work and achieves strong performance for both short and long-form generations on a range of datasets.

Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy

TL;DR

This work proposes fine-tuning using semantic entropy, an uncertainty measure derived from introspection into the model which does not require external labels, and demonstrates that this approach matches or outperforms models fine-tuned using prior work and achieves strong performance for both short and long-form generations on a range of datasets.

Abstract

Large Language Models (LLMs) are known to hallucinate, whereby they generate plausible but inaccurate text. This phenomenon poses significant risks in critical applications, such as medicine or law, necessitating robust hallucination mitigation strategies. While recent works have proposed fine-tuning methods to teach LLMs to abstain from answering questions beyond their knowledge or capabilities, these methods rely on the existence of ground-truth labels or are limited to short-form responses. To address these limitations, we propose fine-tuning using semantic entropy, an uncertainty measure derived from introspection into the model which does not require external labels. We demonstrate that our approach matches or outperforms models fine-tuned using prior work and achieves strong performance for both short and long-form generations on a range of datasets.

Paper Structure

This paper contains 21 sections, 8 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Our method, SE (Llama), matches or outperforms R-Tuning and R-Tuning-U for Long-QA and Short-QA in in-distribution experiments. Mean Accuracy-Engagement Distances (AEDs) are shown on top of each bar. Standard deviations are shown as error bars. The lower the AED, the better.
  • Figure 2: Our method, SE (Llama), matches or outperforms R-Tuning and R-Tuning-U for Long-QA and Short-QA in out-of-distribution experiments. Mean Accuracy-Engagement Distances (AEDs) are shown on top of each bar. Standard deviations are shown as error bars. The lower the AED, the better.
  • Figure 3: SE (Llama) forms a frontier over other methods in the Long-QA Adaptation Plot. Each point represents a fine-tuned model trained at a specific threshold.
  • Figure 4: Long-QA Free-form Prompt.
  • Figure 5: Prompt for Short-QA.
  • ...and 3 more figures