Image Inpainting via Conditional Texture and Structure Dual Generation
Xiefan Guo, Hongyu Yang, Di Huang
TL;DR
The paper tackles large-hole image inpainting by introducing a dual-generation framework that separately models structure-constrained texture synthesis and texture-guided structure reconstruction. A Bi-directional Gated Feature Fusion (Bi-GFF) module and a Contextual Feature Aggregation (CFA) module enable robust, cross-modal refinement, while a two-stream discriminator ensures texture and structure fidelity. The approach employs a partial-convolution architecture and a composite loss including ${\mathcal{L}}_{rec}$, ${\mathcal{L}}_{perc}$, ${\mathcal{L}}_{style}$, ${\mathcal{L}}_{adv}$, and ${\mathcal{L}}_{inter}$ to achieve sharp, globally consistent results, validated on CelebA, Paris StreetView, and Places2 with state-of-the-art performance. The work provides extensive ablations demonstrating the benefits of structure priors, dual-generation, and multi-scale contextual aggregation, and the code is released for reproducibility.
Abstract
Deep generative approaches have recently made considerable progress in image inpainting by introducing structure priors. Due to the lack of proper interaction with image texture during structure reconstruction, however, current solutions are incompetent in handling the cases with large corruptions, and they generally suffer from distorted results. In this paper, we propose a novel two-stream network for image inpainting, which models the structure-constrained texture synthesis and texture-guided structure reconstruction in a coupled manner so that they better leverage each other for more plausible generation. Furthermore, to enhance the global consistency, a Bi-directional Gated Feature Fusion (Bi-GFF) module is designed to exchange and combine the structure and texture information and a Contextual Feature Aggregation (CFA) module is developed to refine the generated contents by region affinity learning and multi-scale feature aggregation. Qualitative and quantitative experiments on the CelebA, Paris StreetView and Places2 datasets demonstrate the superiority of the proposed method. Our code is available at https://github.com/Xiefan-Guo/CTSDG.
