Table of Contents
Fetching ...

LatentSwap: An Efficient Latent Code Mapping Framework for Face Swapping

Changho Choi, Minho Kim, Junhyeok Lee, Hyoung-Kyu Song, Younggeun Kim, Seungryong Kim

TL;DR

LatentSwap tackles the inefficiency and data dependence of traditional face-swapping approaches by introducing a lightweight latent-code mixer that operates inside a pre-trained generator's latent space. The method trains on randomly sampled latent pairs and uses a pre-trained GAN inversion model at inference, enabling photorealistic, high-resolution swaps without additional datasets. The key contributions are the latent mixer architecture, a concise loss design with controllable trade-offs, and demonstrated applicability to 3D-aware generators like StyleNeRF, as well as downstream latent-space editing. This approach offers a practical, fast, and modular solution with broad potential for real-time applications and further 3D-aware extensions in face-related editing tasks.

Abstract

We propose LatentSwap, a simple face swapping framework generating a face swap latent code of a given generator. Utilizing randomly sampled latent codes, our framework is light and does not require datasets besides employing the pre-trained models, with the training procedure also being fast and straightforward. The loss objective consists of only three terms, and can effectively control the face swap results between source and target images. By attaching a pre-trained GAN inversion model independent to the model and using the StyleGAN2 generator, our model produces photorealistic and high-resolution images comparable to other competitive face swap models. We show that our framework is applicable to other generators such as StyleNeRF, paving a way to 3D-aware face swapping and is also compatible with other downstream StyleGAN2 generator tasks. The source code and models can be found at \url{https://github.com/usingcolor/LatentSwap}.

LatentSwap: An Efficient Latent Code Mapping Framework for Face Swapping

TL;DR

LatentSwap tackles the inefficiency and data dependence of traditional face-swapping approaches by introducing a lightweight latent-code mixer that operates inside a pre-trained generator's latent space. The method trains on randomly sampled latent pairs and uses a pre-trained GAN inversion model at inference, enabling photorealistic, high-resolution swaps without additional datasets. The key contributions are the latent mixer architecture, a concise loss design with controllable trade-offs, and demonstrated applicability to 3D-aware generators like StyleNeRF, as well as downstream latent-space editing. This approach offers a practical, fast, and modular solution with broad potential for real-time applications and further 3D-aware extensions in face-related editing tasks.

Abstract

We propose LatentSwap, a simple face swapping framework generating a face swap latent code of a given generator. Utilizing randomly sampled latent codes, our framework is light and does not require datasets besides employing the pre-trained models, with the training procedure also being fast and straightforward. The loss objective consists of only three terms, and can effectively control the face swap results between source and target images. By attaching a pre-trained GAN inversion model independent to the model and using the StyleGAN2 generator, our model produces photorealistic and high-resolution images comparable to other competitive face swap models. We show that our framework is applicable to other generators such as StyleNeRF, paving a way to 3D-aware face swapping and is also compatible with other downstream StyleGAN2 generator tasks. The source code and models can be found at \url{https://github.com/usingcolor/LatentSwap}.
Paper Structure (16 sections, 4 equations, 11 figures, 2 tables)

This paper contains 16 sections, 4 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Face swapping results by LatentSwap. The model, given a source and a target image, replaces the identity of the target image (each row) to the identity of the source image (each column) while maintaining the attributes of the target images such as background and lighting. More results can be found in supplementary material.
  • Figure 2: Overall training scheme of the LatentSwap model. We sample source and target latent codes randomly from a normal distribution, which are mapped onto $\mathcal{W}$ and subsequently copied 18-fold as a vector on $\mathcal{W+}$ space. The $\mathcal{W+}$ space vector is piped into the latent mixer (whose detailed structure is in the black dotted box). The swapped latent codes are then fed into the generator (the StyleGAN2 pre-trained weights) at different layers to generate the final face-swapped image. The gradient can still flow through the generator, but it remains freezed throughout training.
  • Figure 3: Schematic description of the ID loss. The source and swapped images are mapped to the Smooth-Swap kim_2022 identity embedding space. The loss objective is to minimize the cosine distance between the identity embeddings.
  • Figure 4: Face swapping results of LatentSwap compared to other face swapping models on FF++. We used the code and checkpoints from their official implementation. Our model performs source identity and target attribute preservation well compared to other models.
  • Figure 5: Performance of the latent mixer applied to latent codes from different latent spaces. We used $\lambda = 10^1$ for $\mathcal{Z}$ and $\mathcal{W}$ space results, whereas for the $\mathcal{W}+$ space we used $\lambda = 10^2$. $\mathcal{W}+$ space results keep more target attributes compared to other latent spaces while maintaining source identity, which is not preserved for $\mathcal{Z}$ space results.
  • ...and 6 more figures