Table of Contents
Fetching ...

Utilizing GPT to Enhance Text Summarization: A Strategy to Minimize Hallucinations

Hassan Shakil, Zeydy Ortiz, Grant C. Forbes

TL;DR

This work tackles hallucinations in AI-generated text summaries by integrating extractive, abstractive, and hybrid approaches (DistilBERT and T5) with a GPT-based refinement stage. The pipeline generates unrefined summaries, refines them via prompted GPT evaluations, and assesses improvements using a diverse metric set including FactSumm, QAGS, SummaC, ROUGE, and GPT-3.5 Turbo derived analyses. Results show significant gains in factual consistency and hallucination reduction, particularly for abstractive and hybrid summaries, though some metrics exhibit mixed responses. The study highlights the need for evaluation frameworks that better capture semantic and factual fidelity in large language model assisted summarization and proposes a practical path toward more reliable automatic summarization systems.

Abstract

In this research, we uses the DistilBERT model to generate extractive summary and the T5 model to generate abstractive summaries. Also, we generate hybrid summaries by combining both DistilBERT and T5 models. Central to our research is the implementation of GPT-based refining process to minimize the common problem of hallucinations that happens in AI-generated summaries. We evaluate unrefined summaries and, after refining, we also assess refined summaries using a range of traditional and novel metrics, demonstrating marked improvements in the accuracy and reliability of the summaries. Results highlight significant improvements in reducing hallucinatory content, thereby increasing the factual integrity of the summaries.

Utilizing GPT to Enhance Text Summarization: A Strategy to Minimize Hallucinations

TL;DR

This work tackles hallucinations in AI-generated text summaries by integrating extractive, abstractive, and hybrid approaches (DistilBERT and T5) with a GPT-based refinement stage. The pipeline generates unrefined summaries, refines them via prompted GPT evaluations, and assesses improvements using a diverse metric set including FactSumm, QAGS, SummaC, ROUGE, and GPT-3.5 Turbo derived analyses. Results show significant gains in factual consistency and hallucination reduction, particularly for abstractive and hybrid summaries, though some metrics exhibit mixed responses. The study highlights the need for evaluation frameworks that better capture semantic and factual fidelity in large language model assisted summarization and proposes a practical path toward more reliable automatic summarization systems.

Abstract

In this research, we uses the DistilBERT model to generate extractive summary and the T5 model to generate abstractive summaries. Also, we generate hybrid summaries by combining both DistilBERT and T5 models. Central to our research is the implementation of GPT-based refining process to minimize the common problem of hallucinations that happens in AI-generated summaries. We evaluate unrefined summaries and, after refining, we also assess refined summaries using a range of traditional and novel metrics, demonstrating marked improvements in the accuracy and reliability of the summaries. Results highlight significant improvements in reducing hallucinatory content, thereby increasing the factual integrity of the summaries.
Paper Structure (29 sections, 3 figures, 1 table, 2 algorithms)

This paper contains 29 sections, 3 figures, 1 table, 2 algorithms.

Figures (3)

  • Figure 1: Schematic Representation of the Research Methodology
  • Figure 2: Comparison of evaluation metrics for unrefined and refined summaries.
  • Figure 3: Scatter plot illustrating the correlation between pre- and post-refinement scores, with the line of best fit highlighting the general trend of improvement.