Table of Contents
Fetching ...

StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN

Jongwoo Choi, Kwanggyoon Seo, Amirsaman Ashtari, Junyong Noh

TL;DR

This work proposes multi-scale deep feature warping (MSDFW), which warps the intermediate features of a pre-trained StyleGAN at different resolutions, and generates cinemagraphs automatically from a still landscape image using a pre-trained StyleGAN.

Abstract

We propose a method that can generate cinemagraphs automatically from a still landscape image using a pre-trained StyleGAN. Inspired by the success of recent unconditional video generation, we leverage a powerful pre-trained image generator to synthesize high-quality cinemagraphs. Unlike previous approaches that mainly utilize the latent space of a pre-trained StyleGAN, our approach utilizes its deep feature space for both GAN inversion and cinemagraph generation. Specifically, we propose multi-scale deep feature warping (MSDFW), which warps the intermediate features of a pre-trained StyleGAN at different resolutions. By using MSDFW, the generated cinemagraphs are of high resolution and exhibit plausible looping animation. We demonstrate the superiority of our method through user studies and quantitative comparisons with state-of-the-art cinemagraph generation methods and a video generation method that uses a pre-trained StyleGAN.

StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN

TL;DR

This work proposes multi-scale deep feature warping (MSDFW), which warps the intermediate features of a pre-trained StyleGAN at different resolutions, and generates cinemagraphs automatically from a still landscape image using a pre-trained StyleGAN.

Abstract

We propose a method that can generate cinemagraphs automatically from a still landscape image using a pre-trained StyleGAN. Inspired by the success of recent unconditional video generation, we leverage a powerful pre-trained image generator to synthesize high-quality cinemagraphs. Unlike previous approaches that mainly utilize the latent space of a pre-trained StyleGAN, our approach utilizes its deep feature space for both GAN inversion and cinemagraph generation. Specifically, we propose multi-scale deep feature warping (MSDFW), which warps the intermediate features of a pre-trained StyleGAN at different resolutions. By using MSDFW, the generated cinemagraphs are of high resolution and exhibit plausible looping animation. We demonstrate the superiority of our method through user studies and quantitative comparisons with state-of-the-art cinemagraph generation methods and a video generation method that uses a pre-trained StyleGAN.
Paper Structure (27 sections, 6 equations, 9 figures, 5 tables)

This paper contains 27 sections, 6 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Given a landscape image, StyleCineGAN generates a seamless cinemagraph at 1024$\times$1024 resolution. This figure contains video clips, thus consider viewing it using Adobe Reader. The same results are also included in the supplementary video.
  • Figure 2: Overview of StyleCineGAN. Given an input landscape image $I$, our goal is to generate a cienemagraph using a fixed pre-trained StyleGAN $G$. We project the image into both latent codes $w^+$ and deep features $D^{10}$ of $G$. Using the deep features $D^*$, a mask predictor predicts a segmentation mask $S$. To animate the input image, we use a motion generator to predict the motion field $M$ from $I$. $M$ is refined using $S$. Through Euler integration, $M$ produces the future and past displacement fields $F_{0\to t}$ and $F_{N\to t}$. To synthesize cinemagraph frames, we add a DFW layer in between the layers of $G$. DFW refers to Eqns. \ref{['eqn:dfw1']} and \ref{['eqn:dfw2']}. This modification enables the intermediate features of $G$ to be warped according to $F_{0\to t}$ and $F_{N\to t}$ using a joint splatting method at different resolutions, specifically for the StyleGAN layers indexed with $i\in [10, 12, 14, 16, 18]$. The warped deep features are used to synthesize frames $\hat{I}_t$ resulting in the final cinemagraph video.
  • Figure 3: Generated cinemagraph results. This figure contains video clips, thus consider viewing it using Adobe Reader. The first two are cinemagraphs without appearance change, and the last two are cinemagraphs with appearance change.
  • Figure 4: Qualitative comparison with state-of-the-art cinemagraph generation methods. Please refer to the supplementary video for more examples.
  • Figure 5: Qualitative comparison with the state-of-the-art video generation method, MoCoGAN-HD tian2021a. For more examples from this comparison, please refer to the supplementary video.
  • ...and 4 more figures