Image Fine-grained Inpainting
Zheng Hui, Jie Li, Xiumei Wang, Xinbo Gao
TL;DR
This paper tackles image inpainting, particularly restoring large missing regions with realistic global structure and fine-grained textures. It introduces a one-stage Dense Multi-Scale Fusion Network (DMFN) built from dense multi-scale fusion blocks (DMFB) and guided by a suite of novel losses: self-guided regression to focus on uncertain regions, geometrical alignment to preserve semantic localization, and discriminator feature matching within a RaGAN framework for local-global consistency. The method combines a global/local discriminator, VGG-based perceptual losses, and an optimized final objective to produce high-quality inpainted results across faces, buildings, and natural scenes, outperforming several state-of-the-art approaches on multiple datasets. Ablation studies confirm the contributions of DMFB, the self-guided regression loss, and the alignment constraint, demonstrating robust improvements in both qualitative appearance and quantitative metrics.
Abstract
Image inpainting techniques have shown promising improvement with the assistance of generative adversarial networks (GANs) recently. However, most of them often suffered from completed results with unreasonable structure or blurriness. To mitigate this problem, in this paper, we present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields. Benefited from the property of this network, we can more easily recover large regions in an incomplete image. To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss for concentrating on uncertain areas and enhancing the semantic details. Besides, we devise a geometrical alignment constraint item to compensate for the pixel-based distance between prediction features and ground-truth ones. We also employ a discriminator with local and global branches to ensure local-global contents consistency. To further improve the quality of generated images, discriminator feature matching on the local branch is introduced, which dynamically minimizes the similarity of intermediate features between synthetic and ground-truth patches. Extensive experiments on several public datasets demonstrate that our approach outperforms current state-of-the-art methods. Code is available at https://github.com/Zheng222/DMFN.
