Table of Contents
Fetching ...

Feature Unlearning for Pre-trained GANs and VAEs

Saemi Moon, Seunghyuk Cho, Dongwoo Kim

TL;DR

This work aims to unlearn a specific feature, such as hairstyle from facial images, from the pre-trained generative models: GANs and VAEs, and shows that the unlearned model is more robust under the presence of malicious parties.

Abstract

We tackle the problem of feature unlearning from a pre-trained image generative model: GANs and VAEs. Unlike a common unlearning task where an unlearning target is a subset of the training set, we aim to unlearn a specific feature, such as hairstyle from facial images, from the pre-trained generative models. As the target feature is only presented in a local region of an image, unlearning the entire image from the pre-trained model may result in losing other details in the remaining region of the image. To specify which features to unlearn, we collect randomly generated images that contain the target features. We then identify a latent representation corresponding to the target feature and then use the representation to fine-tune the pre-trained model. Through experiments on MNIST, CelebA, and FFHQ datasets, we show that target features are successfully removed while keeping the fidelity of the original models. Further experiments with an adversarial attack show that the unlearned model is more robust under the presence of malicious parties.

Feature Unlearning for Pre-trained GANs and VAEs

TL;DR

This work aims to unlearn a specific feature, such as hairstyle from facial images, from the pre-trained generative models: GANs and VAEs, and shows that the unlearned model is more robust under the presence of malicious parties.

Abstract

We tackle the problem of feature unlearning from a pre-trained image generative model: GANs and VAEs. Unlike a common unlearning task where an unlearning target is a subset of the training set, we aim to unlearn a specific feature, such as hairstyle from facial images, from the pre-trained generative models. As the target feature is only presented in a local region of an image, unlearning the entire image from the pre-trained model may result in losing other details in the remaining region of the image. To specify which features to unlearn, we collect randomly generated images that contain the target features. We then identify a latent representation corresponding to the target feature and then use the representation to fine-tune the pre-trained model. Through experiments on MNIST, CelebA, and FFHQ datasets, we show that target features are successfully removed while keeping the fidelity of the original models. Further experiments with an adversarial attack show that the unlearned model is more robust under the presence of malicious parties.
Paper Structure (35 sections, 6 equations, 14 figures, 8 tables, 1 algorithm)

This paper contains 35 sections, 6 equations, 14 figures, 8 tables, 1 algorithm.

Figures (14)

  • Figure 1: Result of unlearning various features from pre-trained StyleGAN model. We utilize the same latent vector to generate images from both the original and the unlearned models. Our method effectively unlearns the target feature while maintaining high image quality.
  • Figure 2: Illustration of interface used to collect images containing the target feature from generated images. A user selects images that contain the target feature to be unlearned. The selected and non-selected images serve as positive and negative datasets for target feature identification.
  • Figure 3: Overall illustration of the generative model unlearning framework. Based on whether the randomly sampled vector ${\mathbf{z}}$ has the target feature, we use different loss functions to unlearn the target feature. $t$ refers to a threshold, and $\hat{{\mathbf{z}}}$ is the translated vector, i.e., $\hat{{\mathbf{z}}} = {\mathbf{z}} - ( \operatorname{proj}_{{\mathbf{z}}_e}({\mathbf{z}}) - t){\mathbf{z}}_e$.
  • Figure 4: Visualization of four different features before and after unlearning from pre-trained GAN models. All paired images in each column are generated from the same latent vector.
  • Figure 5: User study result of unlearning 'Glasses' feature from the StyleGAN.
  • ...and 9 more figures