Table of Contents
Fetching ...

Identifying Factual Inconsistencies in Summaries: Grounding LLM Inference via Task Taxonomy

Liyan Xu, Zhenlin Su, Mo Yu, Jin Xu, Jinho D. Choi, Jie Zhou, Fei Liu

TL;DR

This work consolidates key error types of inconsistent facts in summaries, and incorporates them to facilitate both the zero-shot and supervised paradigms of LLMs, and distill models that fuse the taxonomy into parameters through designed prompt completions and supervised training strategies.

Abstract

Factual inconsistencies pose a significant hurdle for the faithful summarization by generative models. While a major direction to enhance inconsistency detection is to derive stronger Natural Language Inference (NLI) models, we propose an orthogonal aspect that underscores the importance of incorporating task-specific taxonomy into the inference. To this end, we consolidate key error types of inconsistent facts in summaries, and incorporate them to facilitate both the zero-shot and supervised paradigms of LLMs. Extensive experiments on ten datasets of five distinct domains suggest that, zero-shot LLM inference could benefit from the explicit solution space depicted by the error type taxonomy, and achieves state-of-the-art performance overall, surpassing specialized non-LLM baselines, as well as recent LLM baselines. We further distill models that fuse the taxonomy into parameters through our designed prompt completions and supervised training strategies, efficiently substituting state-of-the-art zero-shot inference with much larger LLMs.

Identifying Factual Inconsistencies in Summaries: Grounding LLM Inference via Task Taxonomy

TL;DR

This work consolidates key error types of inconsistent facts in summaries, and incorporates them to facilitate both the zero-shot and supervised paradigms of LLMs, and distill models that fuse the taxonomy into parameters through designed prompt completions and supervised training strategies.

Abstract

Factual inconsistencies pose a significant hurdle for the faithful summarization by generative models. While a major direction to enhance inconsistency detection is to derive stronger Natural Language Inference (NLI) models, we propose an orthogonal aspect that underscores the importance of incorporating task-specific taxonomy into the inference. To this end, we consolidate key error types of inconsistent facts in summaries, and incorporate them to facilitate both the zero-shot and supervised paradigms of LLMs. Extensive experiments on ten datasets of five distinct domains suggest that, zero-shot LLM inference could benefit from the explicit solution space depicted by the error type taxonomy, and achieves state-of-the-art performance overall, surpassing specialized non-LLM baselines, as well as recent LLM baselines. We further distill models that fuse the taxonomy into parameters through our designed prompt completions and supervised training strategies, efficiently substituting state-of-the-art zero-shot inference with much larger LLMs.
Paper Structure (37 sections, 3 figures, 8 tables)

This paper contains 37 sections, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Illustration of our proposed approaches that ground the task inference of factual inconsistency by its taxonomy (Sec. \ref{['sec:taxonomy']}), via either the zero-shot paradigm (Sec. \ref{['sec:zero-shot']}) or the supervised paradigm (Sec. \ref{['sec:train']}) with LLMs.
  • Figure 2: Accuracy of FacTax methods for different summary lengths using ChatGPT.
  • Figure 3: Prompt for FacTax described in Section \ref{['sec:zero-shot']}. Slots in blue refer to the input document and summary.