Table of Contents
Fetching ...

From Missing Pieces to Masterpieces: Image Completion with Context-Adaptive Diffusion

Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Michael Felsberg, Dacheng Tao, Xuelong Li

TL;DR

ConFill tackles the core challenge of diffusion-based image completion: achieving seamless coherence between known and missing regions. It introduces Context-Adaptive Discrepancy (CAD), which modulates the denoising process with context-aware transport costs grounded in a Brenier-potential framework, and pairs it with Dynamic Sampling to allocate more samples to structurally and texturally complex areas. By deriving a CAD-guided posterior and employing gradual approximation with time-travel capabilities, ConFill progressively aligns latent distributions across timesteps, reducing artifacts and improving both fidelity and perceptual quality. Across CelebA-HQ, Places2, and ImageNet, ConFill outperforms state-of-the-art diffusion methods and showing robust generalization to unseen masks, while also delivering faster convergence and competitive performance against non-diffusion approaches.

Abstract

Image completion is a challenging task, particularly when ensuring that generated content seamlessly integrates with existing parts of an image. While recent diffusion models have shown promise, they often struggle with maintaining coherence between known and unknown (missing) regions. This issue arises from the lack of explicit spatial and semantic alignment during the diffusion process, resulting in content that does not smoothly integrate with the original image. Additionally, diffusion models typically rely on global learned distributions rather than localized features, leading to inconsistencies between the generated and existing image parts. In this work, we propose ConFill, a novel framework that introduces a Context-Adaptive Discrepancy (CAD) model to ensure that intermediate distributions of known and unknown regions are closely aligned throughout the diffusion process. By incorporating CAD, our model progressively reduces discrepancies between generated and original images at each diffusion step, leading to contextually aligned completion. Moreover, ConFill uses a new Dynamic Sampling mechanism that adaptively increases the sampling rate in regions with high reconstruction complexity. This approach enables precise adjustments, enhancing detail and integration in restored areas. Extensive experiments demonstrate that ConFill outperforms current methods, setting a new benchmark in image completion.

From Missing Pieces to Masterpieces: Image Completion with Context-Adaptive Diffusion

TL;DR

ConFill tackles the core challenge of diffusion-based image completion: achieving seamless coherence between known and missing regions. It introduces Context-Adaptive Discrepancy (CAD), which modulates the denoising process with context-aware transport costs grounded in a Brenier-potential framework, and pairs it with Dynamic Sampling to allocate more samples to structurally and texturally complex areas. By deriving a CAD-guided posterior and employing gradual approximation with time-travel capabilities, ConFill progressively aligns latent distributions across timesteps, reducing artifacts and improving both fidelity and perceptual quality. Across CelebA-HQ, Places2, and ImageNet, ConFill outperforms state-of-the-art diffusion methods and showing robust generalization to unseen masks, while also delivering faster convergence and competitive performance against non-diffusion approaches.

Abstract

Image completion is a challenging task, particularly when ensuring that generated content seamlessly integrates with existing parts of an image. While recent diffusion models have shown promise, they often struggle with maintaining coherence between known and unknown (missing) regions. This issue arises from the lack of explicit spatial and semantic alignment during the diffusion process, resulting in content that does not smoothly integrate with the original image. Additionally, diffusion models typically rely on global learned distributions rather than localized features, leading to inconsistencies between the generated and existing image parts. In this work, we propose ConFill, a novel framework that introduces a Context-Adaptive Discrepancy (CAD) model to ensure that intermediate distributions of known and unknown regions are closely aligned throughout the diffusion process. By incorporating CAD, our model progressively reduces discrepancies between generated and original images at each diffusion step, leading to contextually aligned completion. Moreover, ConFill uses a new Dynamic Sampling mechanism that adaptively increases the sampling rate in regions with high reconstruction complexity. This approach enables precise adjustments, enhancing detail and integration in restored areas. Extensive experiments demonstrate that ConFill outperforms current methods, setting a new benchmark in image completion.

Paper Structure

This paper contains 19 sections, 20 equations, 12 figures, 9 tables, 2 algorithms.

Figures (12)

  • Figure 1: (a) Generated images by DPS, RePaint, and our ConFill, using a fixed diffusion model on different masked inputs. (b) LPIPS score vs. denoising timesteps for a single input image.
  • Figure 2: The evolution of the discrepancy during the denoising process. This illustrates the mean pixel-wise discrepancy using Context-Adaptive Discrepancy (CAD) (red), Wasserstein Discrepancy (blue), and Euclidean Distance (orange).
  • Figure 3: The ConFill framework. The red curve represents the CAD inverse diffusion steps, which iteratively balance the distribution of image patches across the latent space. The green and red hills represent latent distributions closer to the final completed image and noisy initial steps, respectively.
  • Figure 4: Qualitative comparison on the CelebA-HQ. The facial images generated by ConFill not only present more distinctive facial features but also show a higher degree of similarity and coherence with the original images compared to those produced by other baseline models.
  • Figure 5: Qualitative comparison on the Places2. ConFill outperforms the other models, providing accurate and seamless restoration. The inpainted regions blend naturally with the surrounding landscape, maintaining consistent texture and structure across the images.
  • ...and 7 more figures