CorrFill: Enhancing Faithfulness in Reference-based Inpainting with Correspondence Guidance in Diffusion Models
Kuan-Hung Liu, Cheng-Kun Yang, Min-Hung Chen, Yu-Lun Liu, Yen-Yu Lin
TL;DR
CorrFill tackles faithful reference-based inpainting by introducing a training-free module that imposes explicit correspondence constraints between a reference and a damaged target using self-attention in diffusion models. It stitches the reference and target, derives correspondences from aggregated attention maps, and refines them in a cyclic loop while guiding denoising with attention masks and latent-tensor optimization. The method improves faithfulness to the reference across several baselines on RealEstate10K and MegaDepth, demonstrating substantial PSNR/SSIM gains and reduced artifacts, though it faces challenges with complex geometry and large viewpoint changes. This work offers a practical, training-free approach to improve the alignment between references and inpainted results, with potential for broader downstream diffusion-model controllability.
Abstract
In the task of reference-based image inpainting, an additional reference image is provided to restore a damaged target image to its original state. The advancement of diffusion models, particularly Stable Diffusion, allows for simple formulations in this task. However, existing diffusion-based methods often lack explicit constraints on the correlation between the reference and damaged images, resulting in lower faithfulness to the reference images in the inpainting results. In this work, we propose CorrFill, a training-free module designed to enhance the awareness of geometric correlations between the reference and target images. This enhancement is achieved by guiding the inpainting process with correspondence constraints estimated during inpainting, utilizing attention masking in self-attention layers and an objective function to update the input tensor according to the constraints. Experimental results demonstrate that CorrFill significantly enhances the performance of multiple baseline diffusion-based methods, including state-of-the-art approaches, by emphasizing faithfulness to the reference images.
