Table of Contents
Fetching ...

FaceFolds: Meshed Radiance Manifolds for Efficient Volumetric Rendering of Dynamic Faces

Safa C. Medin, Gengyan Li, Ruofei Du, Stephan Garbin, Philip Davidson, Gregory W. Wornell, Thabo Beeler, Abhimitra Meka

TL;DR

FaceFolds introduces a radiance-manifold-based representation that models a dynamic face sequence with a single static set of $N$ manifolds and a time-conditioned UV texture, exporting to a layered mesh and view-independent texture video for real-time rendering in legacy graphics pipelines. By separating view-dependent and view-independent texture components and using differentiable ray-manifold intersections, the method achieves photorealistic renderings with far lower memory and compute demands than full neural radiance fields. The pipeline supports offline training on multi-view videos and runtime playback in Unity with sub-16 ms per-frame latency for high-resolution outputs, while offering controllable trade-offs via mesh and texture resolution. The approach demonstrates competitive quality against state-of-the-art neural renderers and enables practical deployment in real-time applications without ML inference during rendering, advancing accessible, high-fidelity 3D facial avatars for games and XR.

Abstract

3D rendering of dynamic face captures is a challenging problem, and it demands improvements on several fronts$\unicode{x2014}$photorealism, efficiency, compatibility, and configurability. We present a novel representation that enables high-quality volumetric rendering of an actor's dynamic facial performances with minimal compute and memory footprint. It runs natively on commodity graphics soft- and hardware, and allows for a graceful trade-off between quality and efficiency. Our method utilizes recent advances in neural rendering, particularly learning discrete radiance manifolds to sparsely sample the scene to model volumetric effects. We achieve efficient modeling by learning a single set of manifolds for the entire dynamic sequence, while implicitly modeling appearance changes as temporal canonical texture. We export a single layered mesh and view-independent RGBA texture video that is compatible with legacy graphics renderers without additional ML integration. We demonstrate our method by rendering dynamic face captures of real actors in a game engine, at comparable photorealism to state-of-the-art neural rendering techniques at previously unseen frame rates.

FaceFolds: Meshed Radiance Manifolds for Efficient Volumetric Rendering of Dynamic Faces

TL;DR

FaceFolds introduces a radiance-manifold-based representation that models a dynamic face sequence with a single static set of manifolds and a time-conditioned UV texture, exporting to a layered mesh and view-independent texture video for real-time rendering in legacy graphics pipelines. By separating view-dependent and view-independent texture components and using differentiable ray-manifold intersections, the method achieves photorealistic renderings with far lower memory and compute demands than full neural radiance fields. The pipeline supports offline training on multi-view videos and runtime playback in Unity with sub-16 ms per-frame latency for high-resolution outputs, while offering controllable trade-offs via mesh and texture resolution. The approach demonstrates competitive quality against state-of-the-art neural renderers and enables practical deployment in real-time applications without ML inference during rendering, advancing accessible, high-fidelity 3D facial avatars for games and XR.

Abstract

3D rendering of dynamic face captures is a challenging problem, and it demands improvements on several frontsphotorealism, efficiency, compatibility, and configurability. We present a novel representation that enables high-quality volumetric rendering of an actor's dynamic facial performances with minimal compute and memory footprint. It runs natively on commodity graphics soft- and hardware, and allows for a graceful trade-off between quality and efficiency. Our method utilizes recent advances in neural rendering, particularly learning discrete radiance manifolds to sparsely sample the scene to model volumetric effects. We achieve efficient modeling by learning a single set of manifolds for the entire dynamic sequence, while implicitly modeling appearance changes as temporal canonical texture. We export a single layered mesh and view-independent RGBA texture video that is compatible with legacy graphics renderers without additional ML integration. We demonstrate our method by rendering dynamic face captures of real actors in a game engine, at comparable photorealism to state-of-the-art neural rendering techniques at previously unseen frame rates.
Paper Structure (16 sections, 4 equations, 7 figures, 4 tables)

This paper contains 16 sections, 4 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Training and inference pipelines. Given a set of rays from the training cameras, we determine the intersection of these rays with a set of implicit manifolds predicted by a single manifold predictor. After transforming these intersections to UV-space coordinates, a texture predictor outputs RGBA texture maps conditioned on the video frame index. At inference time, we shoot rays from the surface of a designated hemisphere around the scene towards its center, obtaining a single geometry and a video texture. The view-dependent branch is bypassed to ensure that the appearance is fully diffuse.
  • Figure 2: Video texture visualization. We illustrate $3$ frames ($15^\mathrm{th}$, $30^\mathrm{th}$, and $45^\mathrm{th}$ frames) from the learned texture video of subject 002914589. For each frame, we show full RGBA and alpha-only UV-space texture maps in the top and bottom rows, respectively.
  • Figure 3: Free-viewpoint rendering on Unity. Our representation allows for free-viewpoint rendering of dynamic 3D volumes on consumer hardware. Please refer to the supplementary material for the videos.
  • Figure 4: Qualitative comparisons. Our method achieves comparable visual quality to state-of-the-art neural rendering techniques while facilitating very efficient rendering of dynamic sequences on legacy graphics software without any custom integration of ML pipelines.
  • Figure 5: Frame interpolation results. Interpolating between the learned latent codes of frame indices allows us to achieve high-quality temporal interpolation between training frames with comparable performance to other approaches. Original frames are highligted in red.
  • ...and 2 more figures