Table of Contents
Fetching ...

Stop the Flip-Flop: Context-Preserving Verification for Fast Revocable Diffusion Decoding

Yanzheng Xiang, Lan Wei, Yizhen Yao, Qinglin Zhu, Hanqi Yan, Chen Jin, Philip Alexander Teare, Dandan Zhang, Lin Gui, Amrutha Saseendran, Yulan He

TL;DR

The paper addresses inefficiencies in revocable diffusion decoding caused by flip-flop oscillations, where remasked tokens revert unchanged and waste revision budgets. It introduces COVER, a context-preserving, in-place verification mechanism that uses KV cache override with a diagonal correction to enable faithful leave-one-out verification within a single forward pass, plus stability-aware seed selection and an adaptive revision rate. Through extensive experiments across four dLLMs and multiple benchmarks, COVER achieves substantial speedups and accuracy gains by reducing ineffective revisions and stabilizing parallel drafting. This approach delivers a practical, training-free enhancement for multi-token decoding, with potential impact on real-time, high-quality diffusion-based generation systems.

Abstract

Parallel diffusion decoding can accelerate diffusion language model inference by unmasking multiple tokens per step, but aggressive parallelism often harms quality. Revocable decoding mitigates this by rechecking earlier tokens, yet we observe that existing verification schemes frequently trigger flip-flop oscillations, where tokens are remasked and later restored unchanged. This behaviour slows inference in two ways: remasking verified positions weakens the conditioning context for parallel drafting, and repeated remask cycles consume the revision budget with little net progress. We propose COVER (Cache Override Verification for Efficient Revision), which performs leave-one-out verification and stable drafting within a single forward pass. COVER constructs two attention views via KV cache override: selected seeds are masked for verification, while their cached key value states are injected for all other queries to preserve contextual information, with a closed form diagonal correction preventing self leakage at the seed positions. COVER further prioritises seeds using a stability aware score that balances uncertainty, downstream influence, and cache drift, and it adapts the number of verified seeds per step. Across benchmarks, COVER markedly reduces unnecessary revisions and yields faster decoding while preserving output quality.

Stop the Flip-Flop: Context-Preserving Verification for Fast Revocable Diffusion Decoding

TL;DR

The paper addresses inefficiencies in revocable diffusion decoding caused by flip-flop oscillations, where remasked tokens revert unchanged and waste revision budgets. It introduces COVER, a context-preserving, in-place verification mechanism that uses KV cache override with a diagonal correction to enable faithful leave-one-out verification within a single forward pass, plus stability-aware seed selection and an adaptive revision rate. Through extensive experiments across four dLLMs and multiple benchmarks, COVER achieves substantial speedups and accuracy gains by reducing ineffective revisions and stabilizing parallel drafting. This approach delivers a practical, training-free enhancement for multi-token decoding, with potential impact on real-time, high-quality diffusion-based generation systems.

Abstract

Parallel diffusion decoding can accelerate diffusion language model inference by unmasking multiple tokens per step, but aggressive parallelism often harms quality. Revocable decoding mitigates this by rechecking earlier tokens, yet we observe that existing verification schemes frequently trigger flip-flop oscillations, where tokens are remasked and later restored unchanged. This behaviour slows inference in two ways: remasking verified positions weakens the conditioning context for parallel drafting, and repeated remask cycles consume the revision budget with little net progress. We propose COVER (Cache Override Verification for Efficient Revision), which performs leave-one-out verification and stable drafting within a single forward pass. COVER constructs two attention views via KV cache override: selected seeds are masked for verification, while their cached key value states are injected for all other queries to preserve contextual information, with a closed form diagonal correction preventing self leakage at the seed positions. COVER further prioritises seeds using a stability aware score that balances uncertainty, downstream influence, and cache drift, and it adapts the number of verified seeds per step. Across benchmarks, COVER markedly reduces unnecessary revisions and yields faster decoding while preserving output quality.
Paper Structure (21 sections, 2 theorems, 27 equations, 3 figures, 3 tables)

This paper contains 21 sections, 2 theorems, 27 equations, 3 figures, 3 tables.

Key Result

Lemma 1.1

Assume decoding terminates with no [MASK] tokens, and drafting unmasks at most $B$ positions per step, namely $|\mathcal{D}_t|\le B$ for all $t$. Let $F$ be the total flip flop count defined above. Then the number of decoding steps satisfies

Figures (3)

  • Figure 1: Flip-flop behaviour on HumanEval for Dream-Instruct-7B and LLaDA-Instruct-8B under two revocable baselines (Saber, WINO) and ours (COVER). Unlike baselines that repeatedly ReMask, COVER uses context-preserving in-place verification to reduce oscillatory revisions while maintaining generation quality.
  • Figure 2: Overview of our single-pass revocable diffusion decoding. At step $t$, the model drafts multiple masked positions in parallel and verifies a seed set selected from step $t\!-\!1$. Verification masks the seeds in the input but injects their cached $K,V$ states so non-seed queries see an unchanged context. An attention diagonal correction is applied at the masked seed positions to prevent self-leakage and enable re-prediction from the surrounding context. Each seed is then updated by Keep, Replace, or ReMask, and a stability-aware score based on uncertainty and in/out influence selects the next seed set via top-$k$.
  • Figure 3: Spearman rank correlation between the proposed stability proxy $d_{\mathrm{out}}$ and measured KV drift across diffusion models and tasks. Cell colour and the value indicate the correlation coefficient; values above $0.5$ suggest a strong monotonic relationship, supporting $d_{\mathrm{out}}$ as a stability proxy.

Theorems & Definitions (4)

  • Lemma 1.1: Unmask budget overhead from flip-flop
  • proof
  • Lemma 2.1: Softmax under a single score change
  • proof