Table of Contents
Fetching ...

Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

Dingkang Yang, Dongling Xiao, Jinjie Wei, Mingcheng Li, Zhaoyu Chen, Ke Li, Lihua Zhang

TL;DR

This work tackles factuality in large language models by introducing a decoding-time intervention called Comparator-driven Decoding-Time (CDT). CDT steers next-token predictions by contrasting the base model with a hallucinatory comparator and a truthful comparator, formalized as $p_{cdt}(y_t|x,y_{<t}) \propto p_{\theta}(y_t|x,y_{<t}) \frac{p_{\theta_{f}}(y_t|x,y_{<t})^{\beta}}{p_{\theta_{h}}(y_t|x,y_{<t})^{\gamma}}$, with an adaptive plausibility constraint to filter implausible tokens. The comparators are trained via LoRA-based SFT on multi-task hallucination/factuality data, and a instruction prototype-guided mixture of experts enables task-aware routing across diverse patterns. Empirical results across KNIGHT-Judge, Alpaca-Judge, TruthfulQA, and XSUM demonstrate substantial improvements in factuality and robustness with complementary contributions from both comparators and the mixture-of-experts framework, while maintaining fluency. The approach is model-agnostic and extensible to multiple LLMs, though it incurs modest decoding-time overhead and relies on the availability of task-aligned hallucination/factuality data.

Abstract

Despite their remarkable capabilities, Large Language Models (LLMs) are prone to generate responses that contradict verifiable facts, i.e., unfaithful hallucination content. Existing efforts generally focus on optimizing model parameters or editing semantic representations, which compromise the internal factual knowledge of target LLMs. In addition, hallucinations typically exhibit multifaceted patterns in downstream tasks, limiting the model's holistic performance across tasks. In this paper, we propose a Comparator-driven Decoding-Time (CDT) framework to alleviate the response hallucination. Firstly, we construct hallucinatory and truthful comparators with multi-task fine-tuning samples. In this case, we present an instruction prototype-guided mixture of experts strategy to enhance the ability of the corresponding comparators to capture different hallucination or truthfulness patterns in distinct task instructions. CDT constrains next-token predictions to factuality-robust distributions by contrasting the logit differences between the target LLMs and these comparators. Systematic experiments on multiple downstream tasks show that our framework can significantly improve the model performance and response factuality.

Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

TL;DR

This work tackles factuality in large language models by introducing a decoding-time intervention called Comparator-driven Decoding-Time (CDT). CDT steers next-token predictions by contrasting the base model with a hallucinatory comparator and a truthful comparator, formalized as , with an adaptive plausibility constraint to filter implausible tokens. The comparators are trained via LoRA-based SFT on multi-task hallucination/factuality data, and a instruction prototype-guided mixture of experts enables task-aware routing across diverse patterns. Empirical results across KNIGHT-Judge, Alpaca-Judge, TruthfulQA, and XSUM demonstrate substantial improvements in factuality and robustness with complementary contributions from both comparators and the mixture-of-experts framework, while maintaining fluency. The approach is model-agnostic and extensible to multiple LLMs, though it incurs modest decoding-time overhead and relies on the availability of task-aligned hallucination/factuality data.

Abstract

Despite their remarkable capabilities, Large Language Models (LLMs) are prone to generate responses that contradict verifiable facts, i.e., unfaithful hallucination content. Existing efforts generally focus on optimizing model parameters or editing semantic representations, which compromise the internal factual knowledge of target LLMs. In addition, hallucinations typically exhibit multifaceted patterns in downstream tasks, limiting the model's holistic performance across tasks. In this paper, we propose a Comparator-driven Decoding-Time (CDT) framework to alleviate the response hallucination. Firstly, we construct hallucinatory and truthful comparators with multi-task fine-tuning samples. In this case, we present an instruction prototype-guided mixture of experts strategy to enhance the ability of the corresponding comparators to capture different hallucination or truthfulness patterns in distinct task instructions. CDT constrains next-token predictions to factuality-robust distributions by contrasting the logit differences between the target LLMs and these comparators. Systematic experiments on multiple downstream tasks show that our framework can significantly improve the model performance and response factuality.
Paper Structure (23 sections, 9 equations, 7 figures, 12 tables)

This paper contains 23 sections, 9 equations, 7 figures, 12 tables.

Figures (7)

  • Figure 1: Illustration of multifaceted hallucination patterns generated by LLMs on different tasks das2023divingzheng2023does.
  • Figure 2: (a) Illustration of our Comparator-driven Decoding-Time (CDT) framework. The hallucinatory comparator gives higher weight to the incorrect Honolulu while the truthful comparator favors the factual Harrisburg, expressing the effective distribution control for the next-token prediction. The target LLM removes the hallucination through the CDT framework. (b) We construct hallucinatory/truthful comparators based on explicit hallucinated/factual instruction pairs via the LoRA-based SFT procedures. An instruction prototype-guided mixture of experts strategy is introduced during training to empower the comparators to help the target LLM improve response factuality in different downstream tasks.
  • Figure 3: We show the effect of the number of experts on the performance of different tasks.
  • Figure 4: The extensibility analysis of our framework across different LLMs through the multiple-choice task on TruthfulQA.
  • Figure 5: We show the effect of the number of instruction prototypes on the performance of different tasks.
  • ...and 2 more figures