CoRe: Context-Robust Remasking for Diffusion Language Models
Kevin Zhai, Sabbir Mollah, Zhenyi Wang, Mubarak Shah
TL;DR
CoRe addresses context rigidity in Masked Diffusion Language Models by reframing revision as a robustness problem to context changes. It is a training-free, inference-time framework that stress-tests tokens via masked-context perturbations and targets the most unstable ones for revision using an efficient, margin-guided approximation. Empirically, CoRe yields consistent gains across reasoning and code benchmarks, notably achieving up to +9.2 percentage points on MBPP with only modest additional forward passes, and avoids the degradation observed with stale-confidence baselines. The approach emphasizes structural consistency and is poised to improve diffusion-based decoding in practical, latency-aware settings.
Abstract
Standard decoding in Masked Diffusion Models (MDMs) is hindered by context rigidity: tokens are retained based on transient high confidence, often ignoring that early predictions lack full context. This creates cascade effects where initial inconsistencies misguide the remaining generation. Existing revision strategies attempt to mitigate this by relying on static confidence scores, but these signals are inherently myopic; inconsistent tokens can appear confident to the model itself. We propose Context-Robust Remasking (CoRe), a training-free framework for inference-time revision. Rather than trusting static token probabilities, CoRe identifies context-brittle tokens by probing their sensitivity to targeted masked-context perturbations. We formalize revision as a robust optimization objective over context shifts and efficiently approximate this objective to prioritize unstable tokens for revision. On LLaDA-8B-Base, CoRe delivers consistent improvements across reasoning and code benchmarks, outperforming compute-matched baselines and improving MBPP by up to 9.2 percentage points.
