Table of Contents
Fetching ...

Controlling the Output of a Generative Model by Latent Feature Vector Shifting

Róbert Belanec, Peter Lacko, Kristína Malinovská

TL;DR

This work uses a pre-trained model of StyleGAN3 that generates images of realistic human faces in relatively high resolution and combines the model with a convolutional neural network classifier trained to classify the generated images with binary facial features from the CelebA dataset.

Abstract

State-of-the-art generative models (e.g. StyleGAN3 \cite{karras2021alias}) often generate photorealistic images based on vectors sampled from their latent space. However, the ability to control the output is limited. Here we present our novel method for latent vector shifting for controlled output image modification utilizing semantic features of the generated images. In our approach we use a pre-trained model of StyleGAN3 that generates images of realistic human faces in relatively high resolution. We complement the generative model with a convolutional neural network classifier, namely ResNet34, trained to classify the generated images with binary facial features from the CelebA dataset. Our latent feature shifter is a neural network model with a task to shift the latent vectors of a generative model into a specified feature direction. We have trained latent feature shifter for multiple facial features, and outperformed our baseline method in the number of generated images with the desired feature. To train our latent feature shifter neural network, we have designed a dataset of pairs of latent vectors with and without a certain feature. Based on the evaluation, we conclude that our latent feature shifter approach was successful in the controlled generation of the StyleGAN3 generator.

Controlling the Output of a Generative Model by Latent Feature Vector Shifting

TL;DR

This work uses a pre-trained model of StyleGAN3 that generates images of realistic human faces in relatively high resolution and combines the model with a convolutional neural network classifier trained to classify the generated images with binary facial features from the CelebA dataset.

Abstract

State-of-the-art generative models (e.g. StyleGAN3 \cite{karras2021alias}) often generate photorealistic images based on vectors sampled from their latent space. However, the ability to control the output is limited. Here we present our novel method for latent vector shifting for controlled output image modification utilizing semantic features of the generated images. In our approach we use a pre-trained model of StyleGAN3 that generates images of realistic human faces in relatively high resolution. We complement the generative model with a convolutional neural network classifier, namely ResNet34, trained to classify the generated images with binary facial features from the CelebA dataset. Our latent feature shifter is a neural network model with a task to shift the latent vectors of a generative model into a specified feature direction. We have trained latent feature shifter for multiple facial features, and outperformed our baseline method in the number of generated images with the desired feature. To train our latent feature shifter neural network, we have designed a dataset of pairs of latent vectors with and without a certain feature. Based on the evaluation, we conclude that our latent feature shifter approach was successful in the controlled generation of the StyleGAN3 generator.
Paper Structure (15 sections, 8 figures, 6 tables)

This paper contains 15 sections, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Example of generating four pairs samples of shifted latent vectors dataset from the original image (with green border).
  • Figure 2: Diagram representing the process of generating a latent vector that will be shifted by each model trained on a different feature dataset, which should result in a latent vector representing all of the required features. The plus sign represent a vector concatenation operation.
  • Figure 3: Average validation MSE loss development over ten training epochs for five different shifting latent vector model architectures (a-e).
  • Figure 4: Results of adding eyeglasses feature to eleven random vectors. Each row represents a different approach to adding the feature.
  • Figure 5: Results of adding the male feature to eleven random vectors. Each row represents a different approach to adding the feature.
  • ...and 3 more figures