Table of Contents
Fetching ...

WarpGAN: Warping-Guided 3D GAN Inversion with Style-Based Novel View Inpainting

Kaitao Huang, Yan Yan, Jing-Hao Xue, Hanzi Wang

TL;DR

WarpGAN tackles the challenge of high-quality occluded-region reconstruction in single-view 3D GAN inversion by fusing depth-based warping with a dedicated inpainting network. The method projects the input into the latent space $\mathcal{W}^+$, renders depth maps to warp to novel views, and uses SVINet with symmetry priors and $w^+$-driven style modulation to inpaint occluded regions, achieving strong multi-view consistency. It introduces a training strategy combining real single-view data with synthetic pairs and losses for reconstruction, consistency, and adversarial realism, plus a reverse-warp loss to enable supervision without ground-truth novel views. Experiments on CelebA-HQ and MEAD show WarpGAN surpasses both optimization- and encoder-based baselines in quantitative metrics and qualitative fidelity, while enabling practical editing workflows via PTI-inspired optimization. The approach significantly advances single-image 3D-aware synthesis by effectively leveraging warping-and-inpainting to fill occlusions and maintain coherence across views.

Abstract

3D GAN inversion projects a single image into the latent space of a pre-trained 3D GAN to achieve single-shot novel view synthesis, which requires visible regions with high fidelity and occluded regions with realism and multi-view consistency. However, existing methods focus on the reconstruction of visible regions, while the generation of occluded regions relies only on the generative prior of 3D GAN. As a result, the generated occluded regions often exhibit poor quality due to the information loss caused by the low bit-rate latent code. To address this, we introduce the warping-and-inpainting strategy to incorporate image inpainting into 3D GAN inversion and propose a novel 3D GAN inversion method, WarpGAN. Specifically, we first employ a 3D GAN inversion encoder to project the single-view image into a latent code that serves as the input to 3D GAN. Then, we perform warping to a novel view using the depth map generated by 3D GAN. Finally, we develop a novel SVINet, which leverages the symmetry prior and multi-view image correspondence w.r.t. the same latent code to perform inpainting of occluded regions in the warped image. Quantitative and qualitative experiments demonstrate that our method consistently outperforms several state-of-the-art methods.

WarpGAN: Warping-Guided 3D GAN Inversion with Style-Based Novel View Inpainting

TL;DR

WarpGAN tackles the challenge of high-quality occluded-region reconstruction in single-view 3D GAN inversion by fusing depth-based warping with a dedicated inpainting network. The method projects the input into the latent space , renders depth maps to warp to novel views, and uses SVINet with symmetry priors and -driven style modulation to inpaint occluded regions, achieving strong multi-view consistency. It introduces a training strategy combining real single-view data with synthetic pairs and losses for reconstruction, consistency, and adversarial realism, plus a reverse-warp loss to enable supervision without ground-truth novel views. Experiments on CelebA-HQ and MEAD show WarpGAN surpasses both optimization- and encoder-based baselines in quantitative metrics and qualitative fidelity, while enabling practical editing workflows via PTI-inspired optimization. The approach significantly advances single-image 3D-aware synthesis by effectively leveraging warping-and-inpainting to fill occlusions and maintain coherence across views.

Abstract

3D GAN inversion projects a single image into the latent space of a pre-trained 3D GAN to achieve single-shot novel view synthesis, which requires visible regions with high fidelity and occluded regions with realism and multi-view consistency. However, existing methods focus on the reconstruction of visible regions, while the generation of occluded regions relies only on the generative prior of 3D GAN. As a result, the generated occluded regions often exhibit poor quality due to the information loss caused by the low bit-rate latent code. To address this, we introduce the warping-and-inpainting strategy to incorporate image inpainting into 3D GAN inversion and propose a novel 3D GAN inversion method, WarpGAN. Specifically, we first employ a 3D GAN inversion encoder to project the single-view image into a latent code that serves as the input to 3D GAN. Then, we perform warping to a novel view using the depth map generated by 3D GAN. Finally, we develop a novel SVINet, which leverages the symmetry prior and multi-view image correspondence w.r.t. the same latent code to perform inpainting of occluded regions in the warped image. Quantitative and qualitative experiments demonstrate that our method consistently outperforms several state-of-the-art methods.

Paper Structure

This paper contains 21 sections, 16 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Visual examples. Given a single input image (the first row), our WarpGAN synthesizes images from five novel views: front, right, left, top, and down (the second to the sixth rows).
  • Figure 2: Overview of our WarpGAN, which consists of a 3D GAN inversion network and a style-based novel view inpainting network (SVINet). The "Forward warp" flow (blue arrows) illustrates the inference process of novel view synthesis. During model training, we also require the "Reverse warp" flow (red arrows) to warp the novel view image back to the original view for loss computation.
  • Figure 3: Comparisons of novel view synthesis on the CelebA-HQ dataset between our WarpGAN and several state-of-the-art methods.
  • Figure 4: Comparisons of different methods on the MEAD dataset for synthesizing images of the other four views (R60, R30, L30, and L60) using the front image as input.
  • Figure 5: (a) Qualitative comparisons of the Full Model with model variants "C", "D", and "E"; (b) Some failure cases; (c) Comparisons of image attribute editing effects with PTI and HFGI3D.
  • ...and 5 more figures