From Missing Pieces to Masterpieces: Image Completion with Context-Adaptive Diffusion
Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Michael Felsberg, Dacheng Tao, Xuelong Li
TL;DR
ConFill tackles the core challenge of diffusion-based image completion: achieving seamless coherence between known and missing regions. It introduces Context-Adaptive Discrepancy (CAD), which modulates the denoising process with context-aware transport costs grounded in a Brenier-potential framework, and pairs it with Dynamic Sampling to allocate more samples to structurally and texturally complex areas. By deriving a CAD-guided posterior and employing gradual approximation with time-travel capabilities, ConFill progressively aligns latent distributions across timesteps, reducing artifacts and improving both fidelity and perceptual quality. Across CelebA-HQ, Places2, and ImageNet, ConFill outperforms state-of-the-art diffusion methods and showing robust generalization to unseen masks, while also delivering faster convergence and competitive performance against non-diffusion approaches.
Abstract
Image completion is a challenging task, particularly when ensuring that generated content seamlessly integrates with existing parts of an image. While recent diffusion models have shown promise, they often struggle with maintaining coherence between known and unknown (missing) regions. This issue arises from the lack of explicit spatial and semantic alignment during the diffusion process, resulting in content that does not smoothly integrate with the original image. Additionally, diffusion models typically rely on global learned distributions rather than localized features, leading to inconsistencies between the generated and existing image parts. In this work, we propose ConFill, a novel framework that introduces a Context-Adaptive Discrepancy (CAD) model to ensure that intermediate distributions of known and unknown regions are closely aligned throughout the diffusion process. By incorporating CAD, our model progressively reduces discrepancies between generated and original images at each diffusion step, leading to contextually aligned completion. Moreover, ConFill uses a new Dynamic Sampling mechanism that adaptively increases the sampling rate in regions with high reconstruction complexity. This approach enables precise adjustments, enhancing detail and integration in restored areas. Extensive experiments demonstrate that ConFill outperforms current methods, setting a new benchmark in image completion.
