Table of Contents
Fetching ...

Improved Techniques for GAN based Facial Inpainting

Avisek Lahiri, Arnav Jain, Divyasri Nadendla, Prabir Kumar Biswas

TL;DR

This paper presents an end-to-end trainable parametric network to deterministically start from good initial solutions leading to more photo realistic reconstructions with significant optimization speed up and shows how to efficiently extend GAN based single image inpainter models to sequences.

Abstract

In this paper we present several architectural and optimization recipes for generative adversarial network(GAN) based facial semantic inpainting. Current benchmark models are susceptible to initial solutions of non-convex optimization criterion of GAN based inpainting. We present an end-to-end trainable parametric network to deterministically start from good initial solutions leading to more photo realistic reconstructions with significant optimization speed up. For the first time, we show how to efficiently extend GAN based single image inpainter models to sequences by a)learning to initialize a temporal window of solutions with a recurrent neural network and b)imposing a temporal smoothness loss(during iterative optimization) to respect the redundancy in temporal dimension of a sequence. We conduct comprehensive empirical evaluations on CelebA images and pseudo sequences followed by real life videos of VidTIMIT dataset. The proposed method significantly outperforms current GAN based state-of-the-art in terms of reconstruction quality with a simultaneous speedup of over 15$\times$. We also show that our proposed model is better in preserving facial identity in a sequence even without explicitly using any face recognition module during training.

Improved Techniques for GAN based Facial Inpainting

TL;DR

This paper presents an end-to-end trainable parametric network to deterministically start from good initial solutions leading to more photo realistic reconstructions with significant optimization speed up and shows how to efficiently extend GAN based single image inpainter models to sequences.

Abstract

In this paper we present several architectural and optimization recipes for generative adversarial network(GAN) based facial semantic inpainting. Current benchmark models are susceptible to initial solutions of non-convex optimization criterion of GAN based inpainting. We present an end-to-end trainable parametric network to deterministically start from good initial solutions leading to more photo realistic reconstructions with significant optimization speed up. For the first time, we show how to efficiently extend GAN based single image inpainter models to sequences by a)learning to initialize a temporal window of solutions with a recurrent neural network and b)imposing a temporal smoothness loss(during iterative optimization) to respect the redundancy in temporal dimension of a sequence. We conduct comprehensive empirical evaluations on CelebA images and pseudo sequences followed by real life videos of VidTIMIT dataset. The proposed method significantly outperforms current GAN based state-of-the-art in terms of reconstruction quality with a simultaneous speedup of over 15. We also show that our proposed model is better in preserving facial identity in a sequence even without explicitly using any face recognition module during training.

Paper Structure

This paper contains 20 sections, 10 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Benefit of initializing Eq. \ref{['eq_total_loss']} with proposed learned parametric network, $P_{\theta_z}$(a:) Visualization of initial solutions of Eq. \ref{['eq_total_loss']}. Row 1: original images; Row 2: corrupted images; Row 3: Initial solutions using our proposed network, $P_{\theta_z}$; Row 4: Initial solutions using Yeh et al.yeh2017semantic. Proposed outputs are more photo realistic compared to yeh2017semantic. (b:) Average PSNR after convergence of iterative optimization. Left, right, top, bottom masks damage the respective 50% of frame. Central mask damages central 50% and freehand masks damages approximately 50% of frame with freehand drawn masks.
  • Figure 2: Final inpainted outputs after convergence of Eq. \ref{['eq_total_loss']}. Top Row: 64$\times$64. Bottom Row: 128$\times$128. For each triplet, Left: masked image, Middle: Inpainting by Yeh et al.yeh2017semantic, Right: Proposed inpainted output. Proposed outputs are more photo realistic. yeh2017semantic specifically suffers at 128$\times$128 resolution. More examples are provided in supplementary document.
  • Figure 3: Proposed LSTM based joint initialization of $z$ vectors for a group of frames. See Sec. \ref{['sec_lstm']} for details of architecture.
  • Figure 4: Convergence of (a) contextual loss and (b) perceptual loss of Eq. \ref{['eq_total_loss']} for a batch of samples.
  • Figure 5: Visualization of consistency of inpainting pseudo sequences. A pseudo sequence is created by masking a given image with different corruption patterns. Ideally we want an inpainter to yield exactly same outputs for a given subject's pseudo sequence.; Top: Masked original pseudo sequence. Middle: Inpainted sequence with Yeh et al.yeh2017semantic. Bottom: Proposed inpainted sequence. Proposed method yields more consistent sequence w.r.t facial appearances.
  • ...and 3 more figures