Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms
Firas Trabelsi, David Vilar, Mara Finkelstein, Markus Freitag
TL;DR
This work tackles the computational bottleneck of Minimum Bayes Risk decoding in neural MT by proving that the MBR score matrix is largely low-rank and can be completed from a small observed subset. It introduces PMBR, which uses ALS to recover missing utilities and then runs standard MBR, achieving up to a 16x reduction in neural metric computations while preserving translation quality on WMT22 benchmarks. Empirical analyses show a dominant first singular value across metrics, enabling effective rank-1 approximations, and human evaluation corroborates the automatic metrics. The approach offers a practical pathway to deploying MBR in MT and potentially other NLG tasks, with room for exploring alternative completions and broader domains.
Abstract
Minimum Bayes Risk (MBR) decoding is a powerful decoding strategy widely used for text generation tasks, but its quadratic computational complexity limits its practical application. This paper presents a novel approach for approximating MBR decoding using matrix completion techniques, focusing on the task of machine translation. We formulate MBR decoding as a matrix completion problem, where the utility metric scores between candidate hypotheses and pseudo-reference translations form a low-rank matrix. First, we empirically show that the scores matrices indeed have a low-rank structure. Then, we exploit this by only computing a random subset of the scores and efficiently recover the missing entries in the matrix by applying the Alternating Least Squares (ALS) algorithm, thereby enabling a fast approximation of the MBR decoding process. Our experimental results on machine translation tasks demonstrate that the proposed method requires 1/16 utility metric computations compared to vanilla MBR decoding while achieving equal translation quality measured by COMET22 on the WMT22 dataset (en<>de and en<>ru). We also benchmark our method against other approximation methods and we show gains in quality when comparing to them.
