3D Engine-ready Photorealistic Avatars via Dynamic Textures
Yifan Wang, Ivan Molodetskikh, Ondrej Texler, Dimitar Dinev
TL;DR
The paper presents an end-to-end pipeline for creating photorealistic 3D avatars that are ready for standard 3D engines by using explicit mesh representations (FLAME 3DMM) and dynamically generated textures. It decouples identity and expression through a two-stage process, employing Clip-A for identity reconstruction and Clip-B for dynamic texture training, with static textures de-lit and outpainted before a lightweight dynamic texture network enhances key regions. A differentiable renderer bridges the texture pipeline with conventional 3D pipelines, enabling seamless integration in tools like Blender and game engines. The approach achieves competitive visual quality with substantially lower capture and computation demands than fully neural or NeRF-based avatars, and demonstrates practical deployment advantages for games, VR/AR, and telepresence, while acknowledging limitations in full-body coverage and ears/neck detail that warrant future work.
Abstract
As the digital and physical worlds become more intertwined, there has been a lot of interest in digital avatars that closely resemble their real-world counterparts. Current digitization methods used in 3D production pipelines require costly capture setups, making them impractical for mass usage among common consumers. Recent academic literature has found success in reconstructing humans from limited data using implicit representations (e.g., voxels used in NeRFs), which are able to produce impressive videos. However, these methods are incompatible with traditional rendering pipelines, making it difficult to use them in applications such as games. In this work, we propose an end-to-end pipeline that builds explicitly-represented photorealistic 3D avatars using standard 3D assets. Our key idea is the use of dynamically-generated textures to enhance the realism and visually mask deficiencies in the underlying mesh geometry. This allows for seamless integration with current graphics pipelines while achieving comparable visual quality to state-of-the-art 3D avatar generation methods.
