FaceFolds: Meshed Radiance Manifolds for Efficient Volumetric Rendering of Dynamic Faces
Safa C. Medin, Gengyan Li, Ruofei Du, Stephan Garbin, Philip Davidson, Gregory W. Wornell, Thabo Beeler, Abhimitra Meka
TL;DR
FaceFolds introduces a radiance-manifold-based representation that models a dynamic face sequence with a single static set of $N$ manifolds and a time-conditioned UV texture, exporting to a layered mesh and view-independent texture video for real-time rendering in legacy graphics pipelines. By separating view-dependent and view-independent texture components and using differentiable ray-manifold intersections, the method achieves photorealistic renderings with far lower memory and compute demands than full neural radiance fields. The pipeline supports offline training on multi-view videos and runtime playback in Unity with sub-16 ms per-frame latency for high-resolution outputs, while offering controllable trade-offs via mesh and texture resolution. The approach demonstrates competitive quality against state-of-the-art neural renderers and enables practical deployment in real-time applications without ML inference during rendering, advancing accessible, high-fidelity 3D facial avatars for games and XR.
Abstract
3D rendering of dynamic face captures is a challenging problem, and it demands improvements on several fronts$\unicode{x2014}$photorealism, efficiency, compatibility, and configurability. We present a novel representation that enables high-quality volumetric rendering of an actor's dynamic facial performances with minimal compute and memory footprint. It runs natively on commodity graphics soft- and hardware, and allows for a graceful trade-off between quality and efficiency. Our method utilizes recent advances in neural rendering, particularly learning discrete radiance manifolds to sparsely sample the scene to model volumetric effects. We achieve efficient modeling by learning a single set of manifolds for the entire dynamic sequence, while implicitly modeling appearance changes as temporal canonical texture. We export a single layered mesh and view-independent RGBA texture video that is compatible with legacy graphics renderers without additional ML integration. We demonstrate our method by rendering dynamic face captures of real actors in a game engine, at comparable photorealism to state-of-the-art neural rendering techniques at previously unseen frame rates.
