Table of Contents
Fetching ...

Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding

Yuu Jinnai, Ukyo Honda, Tetsuro Morimura, Peinan Zhang

TL;DR

This work tackles the challenge of producing text that is simultaneously high quality and diverse by extending Minimum Bayes Risk (MBR) decoding with diversity objectives. It introduces two methods, Diverse MBR (DMBR) and $k$-Medoids MBR (KMBR), which combine the standard MBR objective with diversity-promoting components and clustering-based selection, respectively. Empirical evaluation across machine translation, image captioning, question generation, common-sense reasoning, and summarization shows that both approaches achieve a better quality-diversity trade-off than diverse beam search and sampling baselines, with DMBR often yielding stronger diversity and competitive quality as the sample size grows. The methods, while slower due to increased pairwise computations, offer a principled way to generate a diverse set of high-quality outputs and highlight opportunities for efficiency improvements and broader open-ended-task application in future work.

Abstract

One of the most important challenges in text generation systems is to produce outputs that are not only correct but also diverse. Recently, Minimum Bayes-Risk (MBR) decoding has gained prominence for generating sentences of the highest quality among the decoding algorithms. However, existing algorithms proposed for generating diverse outputs are predominantly based on beam search or random sampling, thus their output quality is capped by these underlying methods. In this paper, we investigate an alternative approach -- we develop diversity-promoting decoding algorithms by enforcing diversity objectives to MBR decoding. We propose two variants of MBR, Diverse MBR (DMBR) and $k$-medoids MBR (KMBR), methods to generate a set of sentences with high quality and diversity. We evaluate DMBR and KMBR on a variety of directed text generation tasks using encoder-decoder models and a large language model with prompting. The experimental results show that the proposed method achieves a better trade-off than the diverse beam search and sampling algorithms.

Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding

TL;DR

This work tackles the challenge of producing text that is simultaneously high quality and diverse by extending Minimum Bayes Risk (MBR) decoding with diversity objectives. It introduces two methods, Diverse MBR (DMBR) and -Medoids MBR (KMBR), which combine the standard MBR objective with diversity-promoting components and clustering-based selection, respectively. Empirical evaluation across machine translation, image captioning, question generation, common-sense reasoning, and summarization shows that both approaches achieve a better quality-diversity trade-off than diverse beam search and sampling baselines, with DMBR often yielding stronger diversity and competitive quality as the sample size grows. The methods, while slower due to increased pairwise computations, offer a principled way to generate a diverse set of high-quality outputs and highlight opportunities for efficiency improvements and broader open-ended-task application in future work.

Abstract

One of the most important challenges in text generation systems is to produce outputs that are not only correct but also diverse. Recently, Minimum Bayes-Risk (MBR) decoding has gained prominence for generating sentences of the highest quality among the decoding algorithms. However, existing algorithms proposed for generating diverse outputs are predominantly based on beam search or random sampling, thus their output quality is capped by these underlying methods. In this paper, we investigate an alternative approach -- we develop diversity-promoting decoding algorithms by enforcing diversity objectives to MBR decoding. We propose two variants of MBR, Diverse MBR (DMBR) and -medoids MBR (KMBR), methods to generate a set of sentences with high quality and diversity. We evaluate DMBR and KMBR on a variety of directed text generation tasks using encoder-decoder models and a large language model with prompting. The experimental results show that the proposed method achieves a better trade-off than the diverse beam search and sampling algorithms.
Paper Structure (32 sections, 15 equations, 12 figures, 16 tables)

This paper contains 32 sections, 15 equations, 12 figures, 16 tables.

Figures (12)

  • Figure 1: Evaluation of P-BLEU and distinct-2 as a function of mean BLEU on WMT'19 De-En and Ru-En. The size of the outputs $k$ is set to 4. $\uparrow$ and $\downarrow$ denote that larger and smaller are better in diversity, respectively.
  • Figure 2: Evaluation of P-BLEU, distinct-2 as a function of max BLEU (Oracle score) on WMT'19 De-En. The size of the outputs $k$ is set to 4.
  • Figure 3: Evaluation of DMBR and KMBR with varying number of outputs ($k \in \{4, 8, 12\}$). Mean BLEU, P-BLEU, and distinct-2 on WMT'19 De-En are reported.
  • Figure 4: Evaluation of DMBR and KMBR with varying number of samples ($N \in \{32, 64, 128\}$). Mean BLEU, P-BLEU, and distinct-2 on WMT'19 De-En are reported.
  • Figure 5: Evaluation of DMBR and KMBR using varying sampling algorithms: ancestral sampling, nucleus sampling, top-$k$ sampling, and epsilon sampling.. Mean BLEU, P-BLEU, and distinct-2 on WMT'19 De-En are reported.
  • ...and 7 more figures