Learning Position-Aware Implicit Neural Network for Real-World Face Inpainting
Bo Zhao, Huan Yang, Jianlong Fu
TL;DR
This work tackles real-world face inpainting where input shapes and resolutions vary widely, a scenario where prior methods falter in preserving position-sensitive facial structures. It introduces IN^2, an implicit neural inpainting network with a Downsample Processing Encoder, Neighbor Hybrid Attention Blocks, and an Implicit Neural Pyramid Decoder, plus an Adaptive Training Strategy to handle irregular shapes. By explicitly modeling position information through a coordinate-aware decoding pipeline, IN^2 achieves state-of-the-art results on CelebA-HQ in both ideal and real-world settings, notably improving eyes and mouth restoration under arbitrary aspect ratios. The approach demonstrates the practicality of integrating implicit neural representations into face inpainting, enabling robust high-resolution performance without restricting input shape and size.
Abstract
Face inpainting requires the model to have a precise global understanding of the facial position structure. Benefiting from the powerful capabilities of deep learning backbones, recent works in face inpainting have achieved decent performance in ideal setting (square shape with $512px$). However, existing methods often produce a visually unpleasant result, especially in the position-sensitive details (e.g., eyes and nose), when directly applied to arbitrary-shaped images in real-world scenarios. The visually unpleasant position-sensitive details indicate the shortcomings of existing methods in terms of position information processing capability. In this paper, we propose an \textbf{I}mplicit \textbf{N}eural \textbf{I}npainting \textbf{N}etwork (IN$^2$) to handle arbitrary-shape face images in real-world scenarios by explicit modeling for position information. Specifically, a downsample processing encoder is proposed to reduce information loss while obtaining the global semantic feature. A neighbor hybrid attention block is proposed with a hybrid attention mechanism to improve the facial understanding ability of the model without restricting the shape of the input. Finally, an implicit neural pyramid decoder is introduced to explicitly model position information and bridge the gap between low-resolution features and high-resolution output. Extensive experiments demonstrate the superiority of the proposed method in real-world face inpainting task.
