Table of Contents
Fetching ...

Swordsman: Entropy-Driven Adaptive Block Partition for Efficient Diffusion Language Models

Yu Zhang, Xinchen Li, Jialei Zhou, Hongnan Ma, Zhongwei Wan, Yiwei Shi, Duoqian Miao, Qi Zhang, Longbing Cao

TL;DR

Swordsman tackles the semantic misalignment of fixed-block partitioning in diffusion language models by introducing entropy-driven adaptive block partitioning. It detects constituent boundaries through entropy shifts $\Delta H_i = H_{i+1} - H_i$ and adaptively partitions blocks, while employing dynamic, difficulty-aware unmasking thresholds to balance speed and accuracy. The framework is training-free and KV Cache-compatible, achieving state-of-the-art results across GSM8K, HumanEval, MBPP, and Dream benchmarks with competitive latency. This approach enhances practical diffusion decoding by aligning block structure with linguistic constituents, enabling faster, more reliable generation without model retraining.

Abstract

Block-wise decoding effectively improves the inference speed and quality in diffusion language models (DLMs) by combining inter-block sequential denoising and intra-block parallel unmasking. However, existing block-wise decoding methods typically partition blocks in a rigid and fixed manner, which inevitably fragments complete semantic or syntactic constituents, leading to suboptimal performance. Inspired by the entropy reduction hypothesis (ERH), we recognize that constituent boundaries offer greater opportunities for uncertainty reduction, which motivates us to employ entropy analysis for identifying constituent boundaries. Therefore, we propose Swordsman, an entropy-driven adaptive block-wise decoding framework for DLMs. Swordsman adaptively partitions blocks by identifying entropy shifts between adjacent tokens to better align with semantic or syntactic constituent boundaries. In addition, Swordsman dynamically adjusts unmasking thresholds conditioned on the real-time unmasking status within a block, further improving both efficiency and stability. As a training-free framework, supported by KV Cache, Swordsman demonstrates state-of-the-art performance across extensive evaluations.

Swordsman: Entropy-Driven Adaptive Block Partition for Efficient Diffusion Language Models

TL;DR

Swordsman tackles the semantic misalignment of fixed-block partitioning in diffusion language models by introducing entropy-driven adaptive block partitioning. It detects constituent boundaries through entropy shifts and adaptively partitions blocks, while employing dynamic, difficulty-aware unmasking thresholds to balance speed and accuracy. The framework is training-free and KV Cache-compatible, achieving state-of-the-art results across GSM8K, HumanEval, MBPP, and Dream benchmarks with competitive latency. This approach enhances practical diffusion decoding by aligning block structure with linguistic constituents, enabling faster, more reliable generation without model retraining.

Abstract

Block-wise decoding effectively improves the inference speed and quality in diffusion language models (DLMs) by combining inter-block sequential denoising and intra-block parallel unmasking. However, existing block-wise decoding methods typically partition blocks in a rigid and fixed manner, which inevitably fragments complete semantic or syntactic constituents, leading to suboptimal performance. Inspired by the entropy reduction hypothesis (ERH), we recognize that constituent boundaries offer greater opportunities for uncertainty reduction, which motivates us to employ entropy analysis for identifying constituent boundaries. Therefore, we propose Swordsman, an entropy-driven adaptive block-wise decoding framework for DLMs. Swordsman adaptively partitions blocks by identifying entropy shifts between adjacent tokens to better align with semantic or syntactic constituent boundaries. In addition, Swordsman dynamically adjusts unmasking thresholds conditioned on the real-time unmasking status within a block, further improving both efficiency and stability. As a training-free framework, supported by KV Cache, Swordsman demonstrates state-of-the-art performance across extensive evaluations.
Paper Structure (21 sections, 16 equations, 5 figures, 3 tables)

This paper contains 21 sections, 16 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Entropy evolution during decoding reveals semantic boundaries through shifts. Fast-dLLM (a) ignores entropy variance, applying predetermined fixed-length blocks that frequently fragment coherent constituents or merge semantically distinct ones, thereby degrading generation accuracy. Swordsman (b), however, leverages entropy shifts to adaptively align block boundaries with semantic constituents, partitioning at maximum shift points to achieve precise segmentation that yields better generations.
  • Figure 2: Visualization of the alignment between entropy-driven block partitioning and the constituency-parse structure, showing that entropy-shift boundaries consistently match semantic constituents.
  • Figure 3: Entropy-driven adaptive partitioning for semantic constituents: each step splits at the maximum entropy shift, decodes the block with a dynamic threshold, refreshes entropy, and repeats.
  • Figure 4: Trade-off between throughput and accuracy
  • Figure 5: Accuracy and latency comparison between fixed and dynamic thresholds with varying $\tau_{\text{init}}$