Table of Contents
Fetching ...

Deepfake for the Good: Generating Avatars through Face-Swapping with Implicit Deepfake Generation

Georgii Stanishevskii, Jakub Steczkiewicz, Tomasz Szczepanik, Sławomir Tadeja, Jacek Tabor, Przemysław Spurek

TL;DR

ImplicitDeepfake presents a practical pipeline to generate plausible 3D avatars from a single image by applying 2D deepfake or diffusion edits to training views and then training either Neural Radiance Fields or Gaussian Splatting. The method leverages NeRF for accurate volumetric rendering and GS for faster, sharper renders, with diffusion-based editing enabling text-conditioned avatar modification. Empirical results show GS often yields crisper visual quality and robustness to viewpoint variations, while NeRF remains capable but can blur under certain 2D inconsistencies; dynamic avatars are demonstrated via NerFace integration. The work enables next-generation avatar creation for virtual environments and gaming, while underscoring important societal and ethical considerations surrounding deepfake technologies.

Abstract

Numerous emerging deep-learning techniques have had a substantial impact on computer graphics. Among the most promising breakthroughs are the rise of Neural Radiance Fields (NeRFs) and Gaussian Splatting (GS). NeRFs encode the object's shape and color in neural network weights using a handful of images with known camera positions to generate novel views. In contrast, GS provides accelerated training and inference without a decrease in rendering quality by encoding the object's characteristics in a collection of Gaussian distributions. These two techniques have found many use cases in spatial computing and other domains. On the other hand, the emergence of deepfake methods has sparked considerable controversy. Deepfakes refers to artificial intelligence-generated videos that closely mimic authentic footage. Using generative models, they can modify facial features, enabling the creation of altered identities or expressions that exhibit a remarkably realistic appearance to a real person. Despite these controversies, deepfake can offer a next-generation solution for avatar creation and gaming when of desirable quality. To that end, we show how to combine all these emerging technologies to obtain a more plausible outcome. Our ImplicitDeepfake uses the classical deepfake algorithm to modify all training images separately and then train NeRF and GS on modified faces. Such simple strategies can produce plausible 3D deepfake-based avatars.

Deepfake for the Good: Generating Avatars through Face-Swapping with Implicit Deepfake Generation

TL;DR

ImplicitDeepfake presents a practical pipeline to generate plausible 3D avatars from a single image by applying 2D deepfake or diffusion edits to training views and then training either Neural Radiance Fields or Gaussian Splatting. The method leverages NeRF for accurate volumetric rendering and GS for faster, sharper renders, with diffusion-based editing enabling text-conditioned avatar modification. Empirical results show GS often yields crisper visual quality and robustness to viewpoint variations, while NeRF remains capable but can blur under certain 2D inconsistencies; dynamic avatars are demonstrated via NerFace integration. The work enables next-generation avatar creation for virtual environments and gaming, while underscoring important societal and ethical considerations surrounding deepfake technologies.

Abstract

Numerous emerging deep-learning techniques have had a substantial impact on computer graphics. Among the most promising breakthroughs are the rise of Neural Radiance Fields (NeRFs) and Gaussian Splatting (GS). NeRFs encode the object's shape and color in neural network weights using a handful of images with known camera positions to generate novel views. In contrast, GS provides accelerated training and inference without a decrease in rendering quality by encoding the object's characteristics in a collection of Gaussian distributions. These two techniques have found many use cases in spatial computing and other domains. On the other hand, the emergence of deepfake methods has sparked considerable controversy. Deepfakes refers to artificial intelligence-generated videos that closely mimic authentic footage. Using generative models, they can modify facial features, enabling the creation of altered identities or expressions that exhibit a remarkably realistic appearance to a real person. Despite these controversies, deepfake can offer a next-generation solution for avatar creation and gaming when of desirable quality. To that end, we show how to combine all these emerging technologies to obtain a more plausible outcome. Our ImplicitDeepfake uses the classical deepfake algorithm to modify all training images separately and then train NeRF and GS on modified faces. Such simple strategies can produce plausible 3D deepfake-based avatars.
Paper Structure (15 sections, 10 equations, 6 figures, 4 tables)

This paper contains 15 sections, 10 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Thanks to neural rendering, we can aggregate information from 2D images to produce novel views of 3D objects (see left-hand column Neural rendering on original images). In ImplicitDeepfake, we utilize 2D DeepFake with the help of a single image and then neural rendering to effectively obtain 3D DeepFake (see the right column Neural rendering on modified images).
  • Figure 2: In our paper, we present ImplicitDeepfake which uses single image and universal 3D model on input and produces 3D avatar.
  • Figure 3: Comparison between ImplicitDeepfake trained on NeRF and GS. In the first column, we see the original input 3D avatars. Then, we present an image of the celebrity who is the target of deepfake. In the last two columns, we have the results obtained with the help of NeRF and GS. In general, GS provides more visually plausible renders.
  • Figure 4: ImplicitDeepfake returns satisfying results even when the source and target faces bear little resemblance to each other. In the following comparison, we present two versions of ImplicitDeepfake (NeRF- and GS-based) in a setup where the source and target are of different sexes.
  • Figure 5: Results of Implicit Diffusion for two different faces. Each row shows the original avatar and two final 3D models generated using two different prompts.
  • ...and 1 more figures