WarpGAN: Warping-Guided 3D GAN Inversion with Style-Based Novel View Inpainting
Kaitao Huang, Yan Yan, Jing-Hao Xue, Hanzi Wang
TL;DR
WarpGAN tackles the challenge of high-quality occluded-region reconstruction in single-view 3D GAN inversion by fusing depth-based warping with a dedicated inpainting network. The method projects the input into the latent space $\mathcal{W}^+$, renders depth maps to warp to novel views, and uses SVINet with symmetry priors and $w^+$-driven style modulation to inpaint occluded regions, achieving strong multi-view consistency. It introduces a training strategy combining real single-view data with synthetic pairs and losses for reconstruction, consistency, and adversarial realism, plus a reverse-warp loss to enable supervision without ground-truth novel views. Experiments on CelebA-HQ and MEAD show WarpGAN surpasses both optimization- and encoder-based baselines in quantitative metrics and qualitative fidelity, while enabling practical editing workflows via PTI-inspired optimization. The approach significantly advances single-image 3D-aware synthesis by effectively leveraging warping-and-inpainting to fill occlusions and maintain coherence across views.
Abstract
3D GAN inversion projects a single image into the latent space of a pre-trained 3D GAN to achieve single-shot novel view synthesis, which requires visible regions with high fidelity and occluded regions with realism and multi-view consistency. However, existing methods focus on the reconstruction of visible regions, while the generation of occluded regions relies only on the generative prior of 3D GAN. As a result, the generated occluded regions often exhibit poor quality due to the information loss caused by the low bit-rate latent code. To address this, we introduce the warping-and-inpainting strategy to incorporate image inpainting into 3D GAN inversion and propose a novel 3D GAN inversion method, WarpGAN. Specifically, we first employ a 3D GAN inversion encoder to project the single-view image into a latent code that serves as the input to 3D GAN. Then, we perform warping to a novel view using the depth map generated by 3D GAN. Finally, we develop a novel SVINet, which leverages the symmetry prior and multi-view image correspondence w.r.t. the same latent code to perform inpainting of occluded regions in the warped image. Quantitative and qualitative experiments demonstrate that our method consistently outperforms several state-of-the-art methods.
