Exploring the Power of Diffusion Large Language Models for Software Engineering: An Empirical Investigation
Jingyao Zhang, Tianlin Li, Xiaoyu Zhang, Qiang Hu, Bin Shi
TL;DR
Diffusion LLMs offer global bidirectional encoding and step-length decoupling, addressing the left-to-right limitations and latency of autoregressive LLMs in software engineering. The study conducts a large-scale, cross-SDLC evaluation using six benchmarks (52,937 tasks) and compares Mercury-Diffusion 7B against Llama-3-8B, finding an average effectiveness gain of about 30% and substantial latency reductions. DLLMs show strong advantages in long-range, multi-file contexts, particularly for code generation and cross-file repair, while Bears-detection highlights dataset challenges. The work establishes DLLMs as a practical, superior paradigm for SE tasks and motivates future SE-specific training and hybrid architectures.
Abstract
Autoregressive Large Language Models (AR-LLMs) are widely used in software engineering (SE) but face limitations in processing code structure information and suffer from high inference latency. Diffusion LLMs (DLLMs) offer a promising alternative with global bidirectional encoding and decoupled generation steps. This work presents the first comprehensive evaluation of DLLMs across the software development lifecycle, including code generation, defect detection, and program repair. On a large-scale benchmark of 52,937 tasks, 7Bparameter DLLMs outperform AR-LLMs with a 30% average accuracy improvement achieving a 113% gain on cross-file repair, while maintaining superior efficiency and reduced latency. Our results establish DLLMs as a superior paradigm for SE tasks.
