Table of Contents
Fetching ...

Dream-Coder 7B: An Open Diffusion Language Model for Code

Zhihui Xie, Jiacheng Ye, Lin Zheng, Jiahui Gao, Jingwei Dong, Zirui Wu, Xueliang Zhao, Shansan Gong, Xin Jiang, Zhenguo Li, Lingpeng Kong

TL;DR

This work introduces Dream-Coder 7B, the first open-source discrete diffusion language model for code with emergent adaptive generation patterns. It combines AR-based initialization via a shift operation with a continuous-time diffusion objective, followed by supervised fine-tuning and reinforcement learning with verifiable rewards to enhance reasoning. On standard coding and reasoning benchmarks, it attains competitive performance with autoregressive baselines and demonstrates unique generation flexibility across complex tasks. The authors provide full training recipes, preprocessing pipelines, and inference code to support reproducibility and further research.

Abstract

We present Dream-Coder 7B, an open-source discrete diffusion language model for code generation that exhibits emergent any-order generation capabilities. Unlike traditional autoregressive (AR) models that decode strictly left-to-right, Dream-Coder 7B adaptively determines its decoding strategy based on the coding task: sketch-first generation for complex algorithms, left-to-right generation for straightforward completions, and interleaved reasoning generation for code understanding tasks. We adapt a pretrained AR checkpoint to a discrete diffusion frameworks with a continuous-time weighted cross-entropy objective. Our post-training recipe comprises (i) supervised fine-tuning, where we mitigate padding pathologies via random truncation and a padding penalty to improve sample efficiency and stabilize generation; and (ii) reinforcement learning with verifiable rewards over a curated high-quality prompt set drawn from open-source datasets, using a tailored reinforcement learning recipe for diffusion language models. The resulting Dream-Coder 7B Instruct attains 21.4\% pass@1 on LiveCodeBench (2410--2505) and demonstrates competitive performance on HumanEval, MBPP, BigCodeBench, and CRUXEval. We release Dream-Coder-7B and Dream-Coder-7B-Instruct checkpoints, training recipes, preprocessing pipelines, and inference code to facilitate reproducibility and further research.

Dream-Coder 7B: An Open Diffusion Language Model for Code

TL;DR

This work introduces Dream-Coder 7B, the first open-source discrete diffusion language model for code with emergent adaptive generation patterns. It combines AR-based initialization via a shift operation with a continuous-time diffusion objective, followed by supervised fine-tuning and reinforcement learning with verifiable rewards to enhance reasoning. On standard coding and reasoning benchmarks, it attains competitive performance with autoregressive baselines and demonstrates unique generation flexibility across complex tasks. The authors provide full training recipes, preprocessing pipelines, and inference code to support reproducibility and further research.

Abstract

We present Dream-Coder 7B, an open-source discrete diffusion language model for code generation that exhibits emergent any-order generation capabilities. Unlike traditional autoregressive (AR) models that decode strictly left-to-right, Dream-Coder 7B adaptively determines its decoding strategy based on the coding task: sketch-first generation for complex algorithms, left-to-right generation for straightforward completions, and interleaved reasoning generation for code understanding tasks. We adapt a pretrained AR checkpoint to a discrete diffusion frameworks with a continuous-time weighted cross-entropy objective. Our post-training recipe comprises (i) supervised fine-tuning, where we mitigate padding pathologies via random truncation and a padding penalty to improve sample efficiency and stabilize generation; and (ii) reinforcement learning with verifiable rewards over a curated high-quality prompt set drawn from open-source datasets, using a tailored reinforcement learning recipe for diffusion language models. The resulting Dream-Coder 7B Instruct attains 21.4\% pass@1 on LiveCodeBench (2410--2505) and demonstrates competitive performance on HumanEval, MBPP, BigCodeBench, and CRUXEval. We release Dream-Coder-7B and Dream-Coder-7B-Instruct checkpoints, training recipes, preprocessing pipelines, and inference code to facilitate reproducibility and further research.

Paper Structure

This paper contains 26 sections, 4 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of Dream-Coder 7B post-training pipeline showing supervised fine-tuning on Ling-Coder-SFT data followed by reinforcement learning with verifiable rewards using GRPO algorithm variants.
  • Figure 2: Generation patterns exhibited by Dream-Coder 7B Instruct across different coding tasks. Colors encode the generation order during decoding (light to dark: first to last), revealing diverse non-autoregressive strategies.
  • Figure :