Table of Contents
Fetching ...

LogicDiff: Logic-Guided Denoising Improves Reasoning in Masked Diffusion Language Models

Shaik Aman

Abstract

Masked diffusion language models (MDLMs) generate text by iteratively unmasking tokens from a fully masked sequence, offering parallel generation and bidirectional context. However, their standard confidence-based unmasking strategy systematically defers high-entropy logical connective tokens, the critical branching points in reasoning chains, leading to severely degraded reasoning performance. We introduce LogicDiff, an inference-time method that replaces confidence-based unmasking with logic-role-guided unmasking. A lightweight classification head (4.2M parameters, 0.05% of the base model) predicts the logical role of each masked position (premise, connective, derived step, conclusion, or filler) from the base model's hidden states with 98.4% accuracy. A dependency-ordered scheduler then unmasks tokens in logical dependency order: premises first, then connectives, then derived steps, then conclusions. Without modifying a single parameter of the base model and without any reinforcement learning or task-specific training, LogicDiff improves LLaDA-8B-Instruct accuracy from 22.0% to 60.7% on GSM8K (+38.7 percentage points) and from 23.6% to 29.2% on MATH-500 (+5.6 pp), with less than 6% speed overhead. Our results demonstrate that a substantial portion of the reasoning deficit in MDLMs is attributable to suboptimal token unmasking order, not to limitations of the model's learned representations.

LogicDiff: Logic-Guided Denoising Improves Reasoning in Masked Diffusion Language Models

Abstract

Masked diffusion language models (MDLMs) generate text by iteratively unmasking tokens from a fully masked sequence, offering parallel generation and bidirectional context. However, their standard confidence-based unmasking strategy systematically defers high-entropy logical connective tokens, the critical branching points in reasoning chains, leading to severely degraded reasoning performance. We introduce LogicDiff, an inference-time method that replaces confidence-based unmasking with logic-role-guided unmasking. A lightweight classification head (4.2M parameters, 0.05% of the base model) predicts the logical role of each masked position (premise, connective, derived step, conclusion, or filler) from the base model's hidden states with 98.4% accuracy. A dependency-ordered scheduler then unmasks tokens in logical dependency order: premises first, then connectives, then derived steps, then conclusions. Without modifying a single parameter of the base model and without any reinforcement learning or task-specific training, LogicDiff improves LLaDA-8B-Instruct accuracy from 22.0% to 60.7% on GSM8K (+38.7 percentage points) and from 23.6% to 29.2% on MATH-500 (+5.6 pp), with less than 6% speed overhead. Our results demonstrate that a substantial portion of the reasoning deficit in MDLMs is attributable to suboptimal token unmasking order, not to limitations of the model's learned representations.

Paper Structure

This paper contains 16 sections, 2 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: LogicDiff system architecture. At each denoising step, the frozen LLaDA model produces hidden states and token predictions. The logic role head classifies each masked position. The dependency scheduler computes priority scores and unmasks tokens in logical dependency order.
  • Figure 2: Unmasking order comparison. Top: Default confidence-based unmasking generates numbers first and defers connectives ("so") to the last step---reasoning direction is locked prematurely. Bottom:LogicDiff unmasks premises first, then connectives, then derived results, then conclusions, establishing logical structure before committing to values.
  • Figure 3: Accuracy comparison on GSM8K and MATH-500. LogicDiff improves over the baseline by +38.7 percentage points on GSM8K and +5.6 pp on MATH-500, using the same frozen model with no RL.