dVoting: Fast Voting for dLLMs
Sicheng Feng, Zigeng Chen, Xinyin Ma, Gongfan Fang, Xinchao Wang
TL;DR
This paper tackles the high inference cost of test-time scaling in diffusion large language models (dLLMs). It introduces dVoting, a training-free voting strategy that uses remask sampling and token-consistency analysis to iteratively refine uncertain tokens and aggregate multiple candidate generations via voting. The authors provide empirical evidence of a key observation: repeated tokens across samples indicate redundancy, quantified by the Non-Unique Position Rate ($NUPR@k$), and show that focusing sampling on uncertain positions yields consistent performance gains across GSM8K, MATH500, ARC-C, and MMLU with modest overhead. The approach achieves a favorable performance–efficiency trade-off, outperforming baselines like HEX and RFG, and generalizes to RL-enhanced models, offering a practical baseline for efficient test-time scaling in dLLMs and enabling broader deployment under limited computational budgets.
Abstract
Diffusion Large Language Models (dLLMs) represent a new paradigm beyond autoregressive modeling, offering competitive performance while naturally enabling a flexible decoding process. Specifically, dLLMs can generate tokens at arbitrary positions in parallel, endowing them with significant potential for parallel test-time scaling, which was previously constrained by severe inefficiency in autoregressive modeling. In this work, we introduce dVoting, a fast voting technique that boosts reasoning capability without training, with only an acceptable extra computational overhead. dVoting is motivated by the observation that, across multiple samples for the same prompt, token predictions remain largely consistent, whereas performance is determined by a small subset of tokens exhibiting cross-sample variability. Leveraging the arbitrary-position generation capability of dLLMs, dVoting performs iterative refinement by sampling, identifying uncertain tokens via consistency analysis, regenerating them through voting, and repeating this process until convergence. Extensive evaluations demonstrate that dVoting consistently improves performance across various benchmarks. It achieves gains of 6.22%-7.66% on GSM8K, 4.40%-7.20% on MATH500, 3.16%-14.84% on ARC-C, and 4.83%-5.74% on MMLU. Our code is available at https://github.com/fscdc/dVoting
