Table of Contents
Fetching ...

Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

Lintao Zhang, Xiangcheng Du, LeoWu TomyEnrique, Yiqun Wang, Yingbin Zheng, Cheng Jin

TL;DR

The paper tackles slow DDPM-based image inpainting by extending RePaint with three speed-up strategies: using a Light-Weight Diffusion Model (LWDM) with a perception-driven loss, skip-step DDIM sampling, and Coarse-to-Fine Sampling (CFS). These components form a two-stage, conditioned diffusion framework with dedicated modules for denoising (CDM) and resampling (CRM). Experimental results on CelebA-HQ and ImageNet across six mask types demonstrate substantial speedups (about $60\times$) with competitive inpainting quality. The approach enables faster, flexible diffusion-based inpainting suitable for interactive editing and broad mask distributions.

Abstract

For image inpainting, the existing Denoising Diffusion Probabilistic Model (DDPM) based method i.e. RePaint can produce high-quality images for any inpainting form. It utilizes a pre-trained DDPM as a prior and generates inpainting results by conditioning on the reverse diffusion process, namely denoising process. However, this process is significantly time-consuming. In this paper, we propose an efficient DDPM-based image inpainting method which includes three speed-up strategies. First, we utilize a pre-trained Light-Weight Diffusion Model (LWDM) to reduce the number of parameters. Second, we introduce a skip-step sampling scheme of Denoising Diffusion Implicit Models (DDIM) for the denoising process. Finally, we propose Coarse-to-Fine Sampling (CFS), which speeds up inference by reducing image resolution in the coarse stage and decreasing denoising timesteps in the refinement stage. We conduct extensive experiments on both faces and general-purpose image inpainting tasks, and our method achieves competitive performance with approximately 60 times speedup.

Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

TL;DR

The paper tackles slow DDPM-based image inpainting by extending RePaint with three speed-up strategies: using a Light-Weight Diffusion Model (LWDM) with a perception-driven loss, skip-step DDIM sampling, and Coarse-to-Fine Sampling (CFS). These components form a two-stage, conditioned diffusion framework with dedicated modules for denoising (CDM) and resampling (CRM). Experimental results on CelebA-HQ and ImageNet across six mask types demonstrate substantial speedups (about ) with competitive inpainting quality. The approach enables faster, flexible diffusion-based inpainting suitable for interactive editing and broad mask distributions.

Abstract

For image inpainting, the existing Denoising Diffusion Probabilistic Model (DDPM) based method i.e. RePaint can produce high-quality images for any inpainting form. It utilizes a pre-trained DDPM as a prior and generates inpainting results by conditioning on the reverse diffusion process, namely denoising process. However, this process is significantly time-consuming. In this paper, we propose an efficient DDPM-based image inpainting method which includes three speed-up strategies. First, we utilize a pre-trained Light-Weight Diffusion Model (LWDM) to reduce the number of parameters. Second, we introduce a skip-step sampling scheme of Denoising Diffusion Implicit Models (DDIM) for the denoising process. Finally, we propose Coarse-to-Fine Sampling (CFS), which speeds up inference by reducing image resolution in the coarse stage and decreasing denoising timesteps in the refinement stage. We conduct extensive experiments on both faces and general-purpose image inpainting tasks, and our method achieves competitive performance with approximately 60 times speedup.
Paper Structure (13 sections, 7 equations, 4 figures, 5 tables)

This paper contains 13 sections, 7 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: (a) Single-stage DDPM-based inpainting method RePaint lugmayr2022repaint. (b) Our proposed efficient DDPM-based method using Coarse-to-Fine Sampling (CFS).
  • Figure 2: Framework of our method. (a) We sample the final result $x_0^f$ from a random Gaussian noise $x_{T_c}$ through two-stage reverse diffusion process with condition guided. (b) Denoise Block denoises $x_t$ to $x_{t-ms}$, which includes $m$ CDMs and $n$ CRMs. (c) Conditioned Denoising Module (CDM) denoises $x_t$ to $x_{t-s}$ with condition added. (d) Conditioned Resampling Module (CRM) converts $x_t$ to more harmonious $x_t$ with the fusion of conditioning information and generated content.
  • Figure 3: Visualization for the effect of CRM.
  • Figure 4: Qualitative results for our method and the state-of-the-arts on the CelebA-HQ karras2017progressive and ImageNet krizhevsky2012imagenet datasets over six different mask types.