Table of Contents
Fetching ...

From Diffusion to One-Step Generation: A Comparative Study of Flow-Based Models with Application to Image Inpainting

Umang Agarwal, Rudraksh Sangore, Sumit Laddha

TL;DR

This work presents a comprehensive comparative study of three generative modeling paradigms: Denoising Diffusion Probabilistic Models (DDPM), Conditional Flow Matching (CFM), and MeanFlow, and demonstrates that CFM achieves an FID of 24.15 with 50 steps, significantly outperforming DDPM (FID 402.98).

Abstract

We present a comprehensive comparative study of three generative modeling paradigms: Denoising Diffusion Probabilistic Models (DDPM), Conditional Flow Matching (CFM), and MeanFlow. While DDPM and CFM require iterative sampling, MeanFlow enables direct one-step generation by modeling the average velocity over time intervals. We implement all three methods using a unified TinyUNet architecture (<1.5M parameters) on CIFAR-10, demonstrating that CFM achieves an FID of 24.15 with 50 steps, significantly outperforming DDPM (FID 402.98). MeanFlow achieves FID 29.15 with single-step sampling -- a 50X reduction in inference time. We further extend CFM to image inpainting, implementing mask-guided sampling with four mask types (center, random bbox, irregular, half). Our fine-tuned inpainting model achieves substantial improvements: PSNR increases from 4.95 to 8.57 dB on center masks (+73%), and SSIM improves from 0.289 to 0.418 (+45%), demonstrating the effectiveness of inpainting-aware training.

From Diffusion to One-Step Generation: A Comparative Study of Flow-Based Models with Application to Image Inpainting

TL;DR

This work presents a comprehensive comparative study of three generative modeling paradigms: Denoising Diffusion Probabilistic Models (DDPM), Conditional Flow Matching (CFM), and MeanFlow, and demonstrates that CFM achieves an FID of 24.15 with 50 steps, significantly outperforming DDPM (FID 402.98).

Abstract

We present a comprehensive comparative study of three generative modeling paradigms: Denoising Diffusion Probabilistic Models (DDPM), Conditional Flow Matching (CFM), and MeanFlow. While DDPM and CFM require iterative sampling, MeanFlow enables direct one-step generation by modeling the average velocity over time intervals. We implement all three methods using a unified TinyUNet architecture (<1.5M parameters) on CIFAR-10, demonstrating that CFM achieves an FID of 24.15 with 50 steps, significantly outperforming DDPM (FID 402.98). MeanFlow achieves FID 29.15 with single-step sampling -- a 50X reduction in inference time. We further extend CFM to image inpainting, implementing mask-guided sampling with four mask types (center, random bbox, irregular, half). Our fine-tuned inpainting model achieves substantial improvements: PSNR increases from 4.95 to 8.57 dB on center masks (+73%), and SSIM improves from 0.289 to 0.418 (+45%), demonstrating the effectiveness of inpainting-aware training.

Paper Structure

This paper contains 32 sections, 10 equations, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: Overall FID and KID comparison across three methods. CFM and MeanFlow significantly outperform DDPM, with CFM achieving the best scores.
  • Figure 2: CFM generated samples (50 steps, CFG=3.0) for selected classes. Ship and automobile show clearest structure; frog demonstrates good texture capture.
  • Figure 3: MeanFlow generated samples with single-step sampling. Despite using only 1 NFE (vs 50 for CFM), samples maintain reasonable quality with recognizable objects and appropriate colors.
  • Figure 4: DDPM samples at epoch 399. Despite extended training (400 epochs), the model fails to generate coherent images, producing only noise-like patterns. This explains the poor FID of 402.98.
  • Figure 5: Per-class FID comparison between CFM and MeanFlow. "Ship" achieves the best FID across both methods, while "bird" and "dog" are most challenging.
  • ...and 6 more figures