Diver: Large Language Model Decoding with Span-Level Mutual Information Verification
Jinliang Lu, Chen Wang, Jiajun Zhang
TL;DR
Diver tackles the fidelity gap in LLM decoding by introducing span-level PMI verification, which selects candidate spans at divergence points based on both the standard model likelihood and PMI gains computed from input spans. By generating dynamic token spans and re-ranking with PMI, Diver improves faithfulness across a wide range of tasks and models, including machine translation, constrained generation, QA, summarization, dialogue, story generation, and code generation. Empirical results show consistent gains over vanilla decoding and contrastive methods, with notable improvements for distant language pairs and high-information inputs, though at the cost of additional decoding time. The work highlights the practicality of span-level verification and points to speed-up avenues such as smaller verification models and speculative decoding to balance quality and throughput.
Abstract
Large language models (LLMs) have shown impressive capabilities in adapting to various tasks when provided with task-specific instructions. However, LLMs using standard decoding strategies often struggle with deviations from the inputs. Intuitively, compliant LLM outputs should reflect the information present in the input, which can be measured by point-wise mutual information (PMI) scores. Therefore, we propose Diver, a novel approach that enhances LLM Decoding through span-level PMI verification. During inference, Diver first identifies divergence steps that may lead to multiple candidate spans. Subsequently, it calculates the PMI scores by assessing the log-likelihood gains of the input if the candidate spans are generated. Finally, the optimal span is selected based on the PMI re-ranked output distributions. We evaluate our method across various downstream tasks, and empirical results demonstrate that Diver significantly outperforms existing decoding methods in both performance and versatility.
