Table of Contents
Fetching ...

Diver: Large Language Model Decoding with Span-Level Mutual Information Verification

Jinliang Lu, Chen Wang, Jiajun Zhang

TL;DR

Diver tackles the fidelity gap in LLM decoding by introducing span-level PMI verification, which selects candidate spans at divergence points based on both the standard model likelihood and PMI gains computed from input spans. By generating dynamic token spans and re-ranking with PMI, Diver improves faithfulness across a wide range of tasks and models, including machine translation, constrained generation, QA, summarization, dialogue, story generation, and code generation. Empirical results show consistent gains over vanilla decoding and contrastive methods, with notable improvements for distant language pairs and high-information inputs, though at the cost of additional decoding time. The work highlights the practicality of span-level verification and points to speed-up avenues such as smaller verification models and speculative decoding to balance quality and throughput.

Abstract

Large language models (LLMs) have shown impressive capabilities in adapting to various tasks when provided with task-specific instructions. However, LLMs using standard decoding strategies often struggle with deviations from the inputs. Intuitively, compliant LLM outputs should reflect the information present in the input, which can be measured by point-wise mutual information (PMI) scores. Therefore, we propose Diver, a novel approach that enhances LLM Decoding through span-level PMI verification. During inference, Diver first identifies divergence steps that may lead to multiple candidate spans. Subsequently, it calculates the PMI scores by assessing the log-likelihood gains of the input if the candidate spans are generated. Finally, the optimal span is selected based on the PMI re-ranked output distributions. We evaluate our method across various downstream tasks, and empirical results demonstrate that Diver significantly outperforms existing decoding methods in both performance and versatility.

Diver: Large Language Model Decoding with Span-Level Mutual Information Verification

TL;DR

Diver tackles the fidelity gap in LLM decoding by introducing span-level PMI verification, which selects candidate spans at divergence points based on both the standard model likelihood and PMI gains computed from input spans. By generating dynamic token spans and re-ranking with PMI, Diver improves faithfulness across a wide range of tasks and models, including machine translation, constrained generation, QA, summarization, dialogue, story generation, and code generation. Empirical results show consistent gains over vanilla decoding and contrastive methods, with notable improvements for distant language pairs and high-information inputs, though at the cost of additional decoding time. The work highlights the practicality of span-level verification and points to speed-up avenues such as smaller verification models and speculative decoding to balance quality and throughput.

Abstract

Large language models (LLMs) have shown impressive capabilities in adapting to various tasks when provided with task-specific instructions. However, LLMs using standard decoding strategies often struggle with deviations from the inputs. Intuitively, compliant LLM outputs should reflect the information present in the input, which can be measured by point-wise mutual information (PMI) scores. Therefore, we propose Diver, a novel approach that enhances LLM Decoding through span-level PMI verification. During inference, Diver first identifies divergence steps that may lead to multiple candidate spans. Subsequently, it calculates the PMI scores by assessing the log-likelihood gains of the input if the candidate spans are generated. Finally, the optimal span is selected based on the PMI re-ranked output distributions. We evaluate our method across various downstream tasks, and empirical results demonstrate that Diver significantly outperforms existing decoding methods in both performance and versatility.
Paper Structure (33 sections, 14 equations, 6 figures, 10 tables)

This paper contains 33 sections, 14 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: The verification based on the disparity of a single token may lead to a locally optimal outcome, such as generating thought at the current decoding step (1). However, if the LLM generates and, thought can also appear in subsequent tokens (subsequent encapsulation (2)), potentially leading to a better translation. $^{\dagger}$ The standard reference for the input $x$ is Lily and Mary thought it was very safe here.
  • Figure 2: An overview of Diver. It first identifies the divergence points and generates several candidate spans. Then, it computes the delta $\Delta$ of the log-likelihood of input $x$ (PMI scores) for the distribution re-ranking. Finally, a token span is selected based on the re-ranked distribution.
  • Figure 3: An example illustrates Dynamic Span acquirement. Bleu and green stars refers to the first-emerged risk points in the two sequences.
  • Figure 4: Human judgments on the (a) most faithful translation selection among different decoding methods in Flores Zh-En and (b) win/tie/loss rates of diver compared with other decoding methods in E2E.
  • Figure 5: Performance improvements on E2E achieved by using Diver$_{\text{R}}$ across various LLMs.
  • ...and 1 more figures