Table of Contents
Fetching ...

Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models

Guanhua Zhang, Jiabao Ji, Yang Zhang, Mo Yu, Tommi Jaakkola, Shiyu Chang

TL;DR

CoPaint tackles the coherence gap in diffusion-model inpainting with fixed pre-trained models by casting inpainting as Bayesian posterior optimization that jointly updates revealed and unrevealed regions at every denoising step. It introduces a tractable posterior via a one-step generation surrogate and a gradually shrinking approximation error, achieving coherence without violating the inpainting constraint. Extensions like CoPaint-TT (Time Travel) and Multi-Step Approximation further improve self-consistency, yielding superior LPIPS and subjective coherence on CelebA-HQ and ImageNet compared to strong baselines. The method demonstrates favorable runtime trade-offs and broad applicability to inpainting and related restoration tasks, marking a practical advance for diffusion-based image completion.

Abstract

Image inpainting refers to the task of generating a complete, natural image based on a partially revealed reference image. Recently, many research interests have been focused on addressing this problem using fixed diffusion models. These approaches typically directly replace the revealed region of the intermediate or final generated images with that of the reference image or its variants. However, since the unrevealed regions are not directly modified to match the context, it results in incoherence between revealed and unrevealed regions. To address the incoherence problem, a small number of methods introduce a rigorous Bayesian framework, but they tend to introduce mismatches between the generated and the reference images due to the approximation errors in computing the posterior distributions. In this paper, we propose COPAINT, which can coherently inpaint the whole image without introducing mismatches. COPAINT also uses the Bayesian framework to jointly modify both revealed and unrevealed regions, but approximates the posterior distribution in a way that allows the errors to gradually drop to zero throughout the denoising steps, thus strongly penalizing any mismatches with the reference image. Our experiments verify that COPAINT can outperform the existing diffusion-based methods under both objective and subjective metrics. The codes are available at https://github.com/UCSB-NLP-Chang/CoPaint/.

Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models

TL;DR

CoPaint tackles the coherence gap in diffusion-model inpainting with fixed pre-trained models by casting inpainting as Bayesian posterior optimization that jointly updates revealed and unrevealed regions at every denoising step. It introduces a tractable posterior via a one-step generation surrogate and a gradually shrinking approximation error, achieving coherence without violating the inpainting constraint. Extensions like CoPaint-TT (Time Travel) and Multi-Step Approximation further improve self-consistency, yielding superior LPIPS and subjective coherence on CelebA-HQ and ImageNet compared to strong baselines. The method demonstrates favorable runtime trade-offs and broad applicability to inpainting and related restoration tasks, marking a practical advance for diffusion-based image completion.

Abstract

Image inpainting refers to the task of generating a complete, natural image based on a partially revealed reference image. Recently, many research interests have been focused on addressing this problem using fixed diffusion models. These approaches typically directly replace the revealed region of the intermediate or final generated images with that of the reference image or its variants. However, since the unrevealed regions are not directly modified to match the context, it results in incoherence between revealed and unrevealed regions. To address the incoherence problem, a small number of methods introduce a rigorous Bayesian framework, but they tend to introduce mismatches between the generated and the reference images due to the approximation errors in computing the posterior distributions. In this paper, we propose COPAINT, which can coherently inpaint the whole image without introducing mismatches. COPAINT also uses the Bayesian framework to jointly modify both revealed and unrevealed regions, but approximates the posterior distribution in a way that allows the errors to gradually drop to zero throughout the denoising steps, thus strongly penalizing any mismatches with the reference image. Our experiments verify that COPAINT can outperform the existing diffusion-based methods under both objective and subjective metrics. The codes are available at https://github.com/UCSB-NLP-Chang/CoPaint/.
Paper Structure (34 sections, 18 equations, 18 figures, 4 tables, 1 algorithm)

This paper contains 34 sections, 18 equations, 18 figures, 4 tables, 1 algorithm.

Figures (18)

  • Figure 1: Inpainted images by Blended (b), Ddrm (c) and our proposed method CoPaint-TT (d). Image are generated conditioned on the given masked input (a) with a fixed diffusion model.
  • Figure 2: The trajectory of the gap between $\bm f_\theta^{(t)} (\tilde{\bm X}_t)$ and $\tilde{\bm X}_0$ along the unconditional diffusion denoising process. We report the pixel-wise averaged Euclidean distance between the two.
  • Figure 3: Time-performance trade-off on CelebA-HQ (left) and ImageNet (right). The x-axis indicates the average time ( $\downarrow$) to process one image, and the y-axis is the average LPIPS ( $\downarrow$).
  • Figure 4: Qualitative results of baselines and ours (CoPaint, CoPaint-TT) on CelebA-HQ with seven degradation masks.
  • Figure 5: Qualitative results of baselines and ours (CoPaint, CoPaint-TT) on ImageNet with seven degradation masks.
  • ...and 13 more figures