Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding

Yuu Jinnai; Ukyo Honda; Tetsuro Morimura; Peinan Zhang

Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding

Yuu Jinnai, Ukyo Honda, Tetsuro Morimura, Peinan Zhang

TL;DR

This work tackles the challenge of producing text that is simultaneously high quality and diverse by extending Minimum Bayes Risk (MBR) decoding with diversity objectives. It introduces two methods, Diverse MBR (DMBR) and $k$-Medoids MBR (KMBR), which combine the standard MBR objective with diversity-promoting components and clustering-based selection, respectively. Empirical evaluation across machine translation, image captioning, question generation, common-sense reasoning, and summarization shows that both approaches achieve a better quality-diversity trade-off than diverse beam search and sampling baselines, with DMBR often yielding stronger diversity and competitive quality as the sample size grows. The methods, while slower due to increased pairwise computations, offer a principled way to generate a diverse set of high-quality outputs and highlight opportunities for efficiency improvements and broader open-ended-task application in future work.

Abstract

One of the most important challenges in text generation systems is to produce outputs that are not only correct but also diverse. Recently, Minimum Bayes-Risk (MBR) decoding has gained prominence for generating sentences of the highest quality among the decoding algorithms. However, existing algorithms proposed for generating diverse outputs are predominantly based on beam search or random sampling, thus their output quality is capped by these underlying methods. In this paper, we investigate an alternative approach -- we develop diversity-promoting decoding algorithms by enforcing diversity objectives to MBR decoding. We propose two variants of MBR, Diverse MBR (DMBR) and $k$-medoids MBR (KMBR), methods to generate a set of sentences with high quality and diversity. We evaluate DMBR and KMBR on a variety of directed text generation tasks using encoder-decoder models and a large language model with prompting. The experimental results show that the proposed method achieves a better trade-off than the diverse beam search and sampling algorithms.

Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding

TL;DR

-Medoids MBR (KMBR), which combine the standard MBR objective with diversity-promoting components and clustering-based selection, respectively. Empirical evaluation across machine translation, image captioning, question generation, common-sense reasoning, and summarization shows that both approaches achieve a better quality-diversity trade-off than diverse beam search and sampling baselines, with DMBR often yielding stronger diversity and competitive quality as the sample size grows. The methods, while slower due to increased pairwise computations, offer a principled way to generate a diverse set of high-quality outputs and highlight opportunities for efficiency improvements and broader open-ended-task application in future work.

Abstract

-medoids MBR (KMBR), methods to generate a set of sentences with high quality and diversity. We evaluate DMBR and KMBR on a variety of directed text generation tasks using encoder-decoder models and a large language model with prompting. The experimental results show that the proposed method achieves a better trade-off than the diverse beam search and sampling algorithms.

Paper Structure (32 sections, 15 equations, 12 figures, 16 tables)

This paper contains 32 sections, 15 equations, 12 figures, 16 tables.

Introduction
Background
Decoding Algorithms for Diversity
Random sampling.
Diversity-aware beam search.
Minimum Bayes Risk (MBR) Decoding
Minimum Bayes Risk Decoding with Diversity
Diverse MBR (DMBR)
$k$-Medoids MBR (KMBR)
Experiments
Machine Translation
DMBR achieves higher diversity than baselines.
DMBR achieves more flexibility than DBS on the quality-diversity trade-off.
DMBR achieves higher oracle score than vanilla MBR.
DMBR outperforms DBS with varying output sizes.
...and 17 more sections

Figures (12)

Figure 1: Evaluation of P-BLEU and distinct-2 as a function of mean BLEU on WMT'19 De-En and Ru-En. The size of the outputs $k$ is set to 4. $\uparrow$ and $\downarrow$ denote that larger and smaller are better in diversity, respectively.
Figure 2: Evaluation of P-BLEU, distinct-2 as a function of max BLEU (Oracle score) on WMT'19 De-En. The size of the outputs $k$ is set to 4.
Figure 3: Evaluation of DMBR and KMBR with varying number of outputs ($k \in \{4, 8, 12\}$). Mean BLEU, P-BLEU, and distinct-2 on WMT'19 De-En are reported.
Figure 4: Evaluation of DMBR and KMBR with varying number of samples ($N \in \{32, 64, 128\}$). Mean BLEU, P-BLEU, and distinct-2 on WMT'19 De-En are reported.
Figure 5: Evaluation of DMBR and KMBR using varying sampling algorithms: ancestral sampling, nucleus sampling, top-$k$ sampling, and epsilon sampling.. Mean BLEU, P-BLEU, and distinct-2 on WMT'19 De-En are reported.
...and 7 more figures

Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding

TL;DR

Abstract

Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding

Authors

TL;DR

Abstract

Table of Contents

Figures (12)