CoRe: Context-Robust Remasking for Diffusion Language Models

Kevin Zhai; Sabbir Mollah; Zhenyi Wang; Mubarak Shah

CoRe: Context-Robust Remasking for Diffusion Language Models

Kevin Zhai, Sabbir Mollah, Zhenyi Wang, Mubarak Shah

TL;DR

CoRe addresses context rigidity in Masked Diffusion Language Models by reframing revision as a robustness problem to context changes. It is a training-free, inference-time framework that stress-tests tokens via masked-context perturbations and targets the most unstable ones for revision using an efficient, margin-guided approximation. Empirically, CoRe yields consistent gains across reasoning and code benchmarks, notably achieving up to +9.2 percentage points on MBPP with only modest additional forward passes, and avoids the degradation observed with stale-confidence baselines. The approach emphasizes structural consistency and is poised to improve diffusion-based decoding in practical, latency-aware settings.

Abstract

Standard decoding in Masked Diffusion Models (MDMs) is hindered by context rigidity: tokens are retained based on transient high confidence, often ignoring that early predictions lack full context. This creates cascade effects where initial inconsistencies misguide the remaining generation. Existing revision strategies attempt to mitigate this by relying on static confidence scores, but these signals are inherently myopic; inconsistent tokens can appear confident to the model itself. We propose Context-Robust Remasking (CoRe), a training-free framework for inference-time revision. Rather than trusting static token probabilities, CoRe identifies context-brittle tokens by probing their sensitivity to targeted masked-context perturbations. We formalize revision as a robust optimization objective over context shifts and efficiently approximate this objective to prioritize unstable tokens for revision. On LLaDA-8B-Base, CoRe delivers consistent improvements across reasoning and code benchmarks, outperforming compute-matched baselines and improving MBPP by up to 9.2 percentage points.

CoRe: Context-Robust Remasking for Diffusion Language Models

TL;DR

Abstract

Paper Structure (33 sections, 17 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 33 sections, 17 equations, 8 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Problem Formulation
Preliminaries of Masked Diffusion Models.
Context Rigidity Anchors Structural Inconsistencies.
Uncertainty Heuristics Become Stale.
Brittleness is Distinct from Uncertainty.
Method
Context-Robust Token Remasking Framework
Context Shifts are Simulated via Perturbation.
Instability Scores Quantify Context Sensitivity.
Perturbed Context Distribution.
An Efficient Remasking Algorithm
Tractable Approximation via Deterministic Masking.
Revision Targets the Most Unstable Tokens.
...and 18 more sections

Figures (8)

Figure 1: Illustration of Context-Robust Remasking (CoRe). Our method operates on the current state $y^{(t)}$, where the response is partially unmasked. (Left) Selection: We select potentially brittle unmasked tokens (red dashed box) to test for stability, distinct from the next token scheduled for unmasking (blue dashed box). (Top-Right) CoRe Mechanism: We mask the selected tokens to create a Perturbed Context $\tilde{y}^{(t)}$ and then compute their Instability Scores under this new context. The token "a" is found to be the most brittle (high instability) and is updated to "an," which is the most likely token given the perturbed context. (Bottom-Right) Base Unmasking: In parallel, the base model uses the original context $y^{(t)}$ to predict the next token ("icy"); this newly unmasked token ("icy") is combined with the updated token ("an") to form the Next State $y^{(t+1)}$, yielding the contextually consistent phrase "an icy."
Figure 2: Moderate candidate set size balances coverage and precision. We evaluate greedy pass@1 accuracy under Low-Confidence unmasking. Performance peaks at $m\!=\!32$; expanding the candidate set to $m\!=\!64$ degrades results, suggesting that widening the perturbation scope introduces false positives (remasking already-consistent tokens) rather than resolving inconsistencies.
Figure 3: Instability Scores Cleanly Separate Stable and Brittle Tokens. Density of instability scores $\ell_i$ computed in the perturbation step (by simultaneously masking each candidate subset $S_t$) for candidate positions that are stable (unchanged) versus brittle (revised) on (a) BBH (reasoning) and (b) MBPP (code). Unchanged positions concentrate tightly near $\ell_i \approx 0$, while revised positions form a distinct heavy tail. This separation indicates that $\ell_i$ serves as a high-precision filter, targeting the small fraction of tokens ($<2\%$) that lack structural anchoring in the surrounding context.
Figure 4: CoRe Resolves Structural Inconsistencies Locked by Standard Decoding. The base model commits to a syntax error ("= =") early in the generation. CoRe identifies the conflicting tokens as context-brittle and invokes revision, successfully recovering the valid contextually stable syntax list().
Figure 5: CoRe fixes error in output format.CoRe corrects the wrong answer "151" to the correct "51" by replacing the token "1" with a space. In contrast, ReMDM-conf focuses on the token $\texttt{<|endoftext|>}$ which is unrelated to the error, and fails to correct the underlying mistake.
...and 3 more figures

CoRe: Context-Robust Remasking for Diffusion Language Models

TL;DR

Abstract

CoRe: Context-Robust Remasking for Diffusion Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)