Table of Contents
Fetching ...

BRIDO: Bringing Democratic Order to Abstractive Summarization

Junhyun Lee, Harshith Goka, Hyeonmok Ko

TL;DR

BRIDO tackles hallucination in abstractive summarization by addressing exposure bias and moving beyond reference-led evaluation. It extends BRIO with a democratic ordering scheme that ranks candidate summaries using inter-candidate ROUGE and a contrastive learning objective, formalized via $Score_{BRIDO}(S_i)=\frac{\sum_{j\neq i} R(S_i,S_j)+\alpha R(S_i,S^*)}{N-1+\alpha}$ and $\mathcal{L}=\mathcal{L}_{\text{xent}}+\gamma\mathcal{L}_{\text{ctr}}$, where $\mathcal{L}_{\text{ctr}}=\sum_i\sum_{j>i}\max(0,f(S_j)-f(S_i)+\lambda_{ij})$. Experiments on XSum and CNN/DM show that BRIDO yields 6.25% and 3.82% improvements in G-Eval consistency over BRIO, respectively, and outperforms base models on key hallucination metrics, indicating effective mitigation of hallucination while preserving summarization quality. The approach leverages diverse beam search, inter-candidate similarity, and adjustable parameters ($\eta$, $N_g$, $N$, $\alpha$, $\lambda$, $\gamma$) to balance diversity, faithfulness, and learning signals. The results suggest practical benefits for safer abstractive summarization and point to future work on human evaluation and extending BRIDO to decoder-only models.

Abstract

Hallucination refers to the inaccurate, irrelevant, and inconsistent text generated from large language models (LLMs). While the LLMs have shown great promise in a variety of tasks, the issue of hallucination still remains a major challenge for many practical uses. In this paper, we tackle the issue of hallucination in abstract text summarization by mitigating exposure bias. Existing models targeted for exposure bias mitigation, namely BRIO, aim for better summarization quality in the ROUGE score. We propose a model that uses a similar exposure bias mitigation strategy but with a goal that is aligned with less hallucination. We conjecture that among a group of candidate outputs, ones with hallucinations will comprise the minority of the whole group. That is, candidates with less similarity with others will have a higher chance of containing hallucinated content. Our method uses this aspect and utilizes contrastive learning, incentivizing candidates with high inter-candidate ROUGE scores. We performed experiments on the XSum and CNN/DM summarization datasets, and our method showed 6.25% and 3.82% improvement, respectively, on the consistency G-Eval score over BRIO.

BRIDO: Bringing Democratic Order to Abstractive Summarization

TL;DR

BRIDO tackles hallucination in abstractive summarization by addressing exposure bias and moving beyond reference-led evaluation. It extends BRIO with a democratic ordering scheme that ranks candidate summaries using inter-candidate ROUGE and a contrastive learning objective, formalized via and , where . Experiments on XSum and CNN/DM show that BRIDO yields 6.25% and 3.82% improvements in G-Eval consistency over BRIO, respectively, and outperforms base models on key hallucination metrics, indicating effective mitigation of hallucination while preserving summarization quality. The approach leverages diverse beam search, inter-candidate similarity, and adjustable parameters (, , , , , ) to balance diversity, faithfulness, and learning signals. The results suggest practical benefits for safer abstractive summarization and point to future work on human evaluation and extending BRIDO to decoder-only models.

Abstract

Hallucination refers to the inaccurate, irrelevant, and inconsistent text generated from large language models (LLMs). While the LLMs have shown great promise in a variety of tasks, the issue of hallucination still remains a major challenge for many practical uses. In this paper, we tackle the issue of hallucination in abstract text summarization by mitigating exposure bias. Existing models targeted for exposure bias mitigation, namely BRIO, aim for better summarization quality in the ROUGE score. We propose a model that uses a similar exposure bias mitigation strategy but with a goal that is aligned with less hallucination. We conjecture that among a group of candidate outputs, ones with hallucinations will comprise the minority of the whole group. That is, candidates with less similarity with others will have a higher chance of containing hallucinated content. Our method uses this aspect and utilizes contrastive learning, incentivizing candidates with high inter-candidate ROUGE scores. We performed experiments on the XSum and CNN/DM summarization datasets, and our method showed 6.25% and 3.82% improvement, respectively, on the consistency G-Eval score over BRIO.

Paper Structure

This paper contains 26 sections, 3 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: The BRIDO framework compared with BRIO liu2022brio and the base models (BART/Pegasus). BRIDO ranks the candidates based on the inter-candidate similarity, while BRIO uses the reference-based similarity score for ranking. In both cases, the similarity is measured by the ROUGE score.