Table of Contents
Fetching ...

3D Engine-ready Photorealistic Avatars via Dynamic Textures

Yifan Wang, Ivan Molodetskikh, Ondrej Texler, Dimitar Dinev

TL;DR

The paper presents an end-to-end pipeline for creating photorealistic 3D avatars that are ready for standard 3D engines by using explicit mesh representations (FLAME 3DMM) and dynamically generated textures. It decouples identity and expression through a two-stage process, employing Clip-A for identity reconstruction and Clip-B for dynamic texture training, with static textures de-lit and outpainted before a lightweight dynamic texture network enhances key regions. A differentiable renderer bridges the texture pipeline with conventional 3D pipelines, enabling seamless integration in tools like Blender and game engines. The approach achieves competitive visual quality with substantially lower capture and computation demands than fully neural or NeRF-based avatars, and demonstrates practical deployment advantages for games, VR/AR, and telepresence, while acknowledging limitations in full-body coverage and ears/neck detail that warrant future work.

Abstract

As the digital and physical worlds become more intertwined, there has been a lot of interest in digital avatars that closely resemble their real-world counterparts. Current digitization methods used in 3D production pipelines require costly capture setups, making them impractical for mass usage among common consumers. Recent academic literature has found success in reconstructing humans from limited data using implicit representations (e.g., voxels used in NeRFs), which are able to produce impressive videos. However, these methods are incompatible with traditional rendering pipelines, making it difficult to use them in applications such as games. In this work, we propose an end-to-end pipeline that builds explicitly-represented photorealistic 3D avatars using standard 3D assets. Our key idea is the use of dynamically-generated textures to enhance the realism and visually mask deficiencies in the underlying mesh geometry. This allows for seamless integration with current graphics pipelines while achieving comparable visual quality to state-of-the-art 3D avatar generation methods.

3D Engine-ready Photorealistic Avatars via Dynamic Textures

TL;DR

The paper presents an end-to-end pipeline for creating photorealistic 3D avatars that are ready for standard 3D engines by using explicit mesh representations (FLAME 3DMM) and dynamically generated textures. It decouples identity and expression through a two-stage process, employing Clip-A for identity reconstruction and Clip-B for dynamic texture training, with static textures de-lit and outpainted before a lightweight dynamic texture network enhances key regions. A differentiable renderer bridges the texture pipeline with conventional 3D pipelines, enabling seamless integration in tools like Blender and game engines. The approach achieves competitive visual quality with substantially lower capture and computation demands than fully neural or NeRF-based avatars, and demonstrates practical deployment advantages for games, VR/AR, and telepresence, while acknowledging limitations in full-body coverage and ears/neck detail that warrant future work.

Abstract

As the digital and physical worlds become more intertwined, there has been a lot of interest in digital avatars that closely resemble their real-world counterparts. Current digitization methods used in 3D production pipelines require costly capture setups, making them impractical for mass usage among common consumers. Recent academic literature has found success in reconstructing humans from limited data using implicit representations (e.g., voxels used in NeRFs), which are able to produce impressive videos. However, these methods are incompatible with traditional rendering pipelines, making it difficult to use them in applications such as games. In this work, we propose an end-to-end pipeline that builds explicitly-represented photorealistic 3D avatars using standard 3D assets. Our key idea is the use of dynamically-generated textures to enhance the realism and visually mask deficiencies in the underlying mesh geometry. This allows for seamless integration with current graphics pipelines while achieving comparable visual quality to state-of-the-art 3D avatar generation methods.

Paper Structure

This paper contains 18 sections, 2 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: A precisely fitted mesh is required for our subsequent rendering pipeline. To reconstruct a mesh, the subject (a) captures themselves using a mobile phone, (b) Neural Deferred Shading (NDS) is used to reconstruct the mesh $S_{raw}$, a face portion $S_{crop}$ (d) of $S_{raw}$ is extracted via a bounding box determined by $P_{mesh}$ which is back-projected from $P_{img}$ in $I_{key}$ (c), and finally a 3DMM FLAME mesh is fitted (e) to the face portion $S_{crop}$.
  • Figure 2: A reconstruction comparison between our adapted NDS and Metashape using the same set of COLMAP-calibrated images (under 20) extracted from Clip-A.
  • Figure 3: The video images from Clip-A (a) are used by the static texture estimation module (b) to generate a de-lit and outpainted static texture (c). Video images (d), and mesh parameters (e) from Clip-B are used to train a dynamic texture network (f) which produces a key-region dynamic texture (g) based on the head model parameters. The static and dynamic textures are combined with the blending mask, and along with the head model are fed into a differentiable renderer (h), the output of which can be regressed against the original images (i).
  • Figure 4: The expressions parameters and view angles are fed into the MLP encoder (a) and reshaped (b) for compatibility with convolutional decoders. The number of channels is expanded with a convolutional layer (c) for more network capacity, and is then decoded into an RGB image (d).
  • Figure 5: Qualitative comparison of our technique with three state-of-the-art methods: NextFace dib2021practical, Neural Head Avatar (NHA) Grassal22_neural_head_avatars, and Instant Volumetric Head Avatars (INSTA) zielonka2023instant. See the supplementary video for more results.
  • ...and 4 more figures