Understanding Diffusion Models via Code Execution
Cheng Yu
TL;DR
The paper addresses the barrier between diffusion-model theory and practical implementation by delivering an execution-focused, ~300-line Python implementation that demonstrates DDPM, DDIM, and classifier-free guidance. It presents a minimal U-Net architecture with time embeddings and a clear forward and reverse diffusion process, linking each code component to its mathematical counterpart. The work documents training and sampling pipelines on MNIST, Fashion-MNIST, and CIFAR-10, and demonstrates how CFG enables conditional generation without extensive engineering. Overall, it provides researchers and practitioners with a compact, implementation-first blueprint that clarifies the correspondence between theory and runnable code, facilitating adoption and experimentation in diffusion-based generative modeling.
Abstract
Diffusion models have achieved remarkable performance in generative modeling, yet their theoretical foundations are often intricate, and the gap between mathematical formulations in papers and practical open-source implementations can be difficult to bridge. Existing tutorials primarily focus on deriving equations, offering limited guidance on how diffusion models actually operate in code. To address this, we present a concise implementation of approximately 300 lines that explains diffusion models from a code-execution perspective. Our minimal example preserves the essential components -- including forward diffusion, reverse sampling, the noise-prediction network, and the training loop -- while removing unnecessary engineering details. This technical report aims to provide researchers with a clear, implementation-first understanding of how diffusion models work in practice and how code and theory correspond. Our code and pre-trained models are available at: https://github.com/disanda/GM/tree/main/DDPM-DDIM-ClassifierFree.
