Table of Contents
Fetching ...

Relightable Neural Actor with Intrinsic Decomposition and Pose Control

Diogo Luvizon, Vladislav Golyanik, Adam Kortylewski, Marc Habermann, Christian Theobalt

TL;DR

The paper tackles relighting and pose control for dynamic humans using a three-part framework: a pose-driven implicit geometry model to capture pose-dependent deformations, a neural intrinsic decomposition that yields UV-space normals, visibility, albedo, and roughness, and a neural renderer that combines these components with a microfacet BRDF and an environment map. It introduces a novel training pipeline that operates on multi-view video under static lighting and leverages a UV-based decomposition with NormalNet, VisibilityNet, and UVDeltaNet to enable pose-aware relighting and appearance editing. The authors also present Relightable Dynamic Actors, a real-world dataset with four identities under six lighting conditions, enabling quantitative evaluation with PSNR, SSIM, and LPIPS on novel poses and illumination. Results demonstrate state-of-the-art relighting quality, detailed self-shadows and wrinkles, and the ability to edit material properties via static UV maps, with limitations mainly around fine facial/hand details and clothing complexity. This work advances photorealistic human rendering for AR/VR/metaverse applications by providing a tractable, editable, and relightable neural actor trained from real multi-view data.

Abstract

Creating a controllable and relightable digital avatar from multi-view video with fixed illumination is a very challenging problem since humans are highly articulated, creating pose-dependent appearance effects, and skin as well as clothing require space-varying BRDF modeling. Existing works on creating animatible avatars either do not focus on relighting at all, require controlled illumination setups, or try to recover a relightable avatar from very low cost setups, i.e. a single RGB video, at the cost of severely limited result quality, e.g. shadows not even being modeled. To address this, we propose Relightable Neural Actor, a new video-based method for learning a pose-driven neural human model that can be relighted, allows appearance editing, and models pose-dependent effects such as wrinkles and self-shadows. Importantly, for training, our method solely requires a multi-view recording of the human under a known, but static lighting condition. To tackle this challenging problem, we leverage an implicit geometry representation of the actor with a drivable density field that models pose-dependent deformations and derive a dynamic mapping between 3D and UV spaces, where normal, visibility, and materials are effectively encoded. To evaluate our approach in real-world scenarios, we collect a new dataset with four identities recorded under different light conditions, indoors and outdoors, providing the first benchmark of its kind for human relighting, and demonstrating state-of-the-art relighting results for novel human poses.

Relightable Neural Actor with Intrinsic Decomposition and Pose Control

TL;DR

The paper tackles relighting and pose control for dynamic humans using a three-part framework: a pose-driven implicit geometry model to capture pose-dependent deformations, a neural intrinsic decomposition that yields UV-space normals, visibility, albedo, and roughness, and a neural renderer that combines these components with a microfacet BRDF and an environment map. It introduces a novel training pipeline that operates on multi-view video under static lighting and leverages a UV-based decomposition with NormalNet, VisibilityNet, and UVDeltaNet to enable pose-aware relighting and appearance editing. The authors also present Relightable Dynamic Actors, a real-world dataset with four identities under six lighting conditions, enabling quantitative evaluation with PSNR, SSIM, and LPIPS on novel poses and illumination. Results demonstrate state-of-the-art relighting quality, detailed self-shadows and wrinkles, and the ability to edit material properties via static UV maps, with limitations mainly around fine facial/hand details and clothing complexity. This work advances photorealistic human rendering for AR/VR/metaverse applications by providing a tractable, editable, and relightable neural actor trained from real multi-view data.

Abstract

Creating a controllable and relightable digital avatar from multi-view video with fixed illumination is a very challenging problem since humans are highly articulated, creating pose-dependent appearance effects, and skin as well as clothing require space-varying BRDF modeling. Existing works on creating animatible avatars either do not focus on relighting at all, require controlled illumination setups, or try to recover a relightable avatar from very low cost setups, i.e. a single RGB video, at the cost of severely limited result quality, e.g. shadows not even being modeled. To address this, we propose Relightable Neural Actor, a new video-based method for learning a pose-driven neural human model that can be relighted, allows appearance editing, and models pose-dependent effects such as wrinkles and self-shadows. Importantly, for training, our method solely requires a multi-view recording of the human under a known, but static lighting condition. To tackle this challenging problem, we leverage an implicit geometry representation of the actor with a drivable density field that models pose-dependent deformations and derive a dynamic mapping between 3D and UV spaces, where normal, visibility, and materials are effectively encoded. To evaluate our approach in real-world scenarios, we collect a new dataset with four identities recorded under different light conditions, indoors and outdoors, providing the first benchmark of its kind for human relighting, and demonstrating state-of-the-art relighting results for novel human poses.
Paper Structure (29 sections, 9 equations, 16 figures, 3 tables)

This paper contains 29 sections, 9 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Our method learns a neural actor that is driven only by a 3D skeletal pose and allows rendering and editing of the actor's appearance under new lightning and poses not seen during training. Our approach models pose-dependent deformations, self-shadows, and performs intrinsic decomposition of dynamic humans through a pose-dependent mapping between 3D and UV spaces, which also enables editing the appearance and material properties at inference time. For training, our method only needs a multi-view video and an environment map.
  • Figure 1: Neural network architecture of the pose-driven geometry model adapted from Neural Actor liu2021neural. The RGB color branch is only used during the training phase of the geometry model and is discarded during the relighting training and inference. The interpolation of texture features (obtained from the texture map) is performed in the UV coordinate $(u,v)$ given by the projection of 3D points onto the human mesh. The obtained feature vector $\psi$ (represented in purple) is also used in our UVDeltaNet. The symbol "$\sim$"refers to positional encoding as in NeRF Mildenhall2020 and "$||$" refers to feature-wise concatenation.
  • Figure 2: Our method takes as input a 3D skeletal pose and a static environment map, and renders the neural actor from a virtual camera position. The pose-driven geometry (\ref{['sec:preliminaries']}) learns an implicit density function, from which we compute the normal and visibility information. Our intrinsic decomposition disentangles normal, visibility, albedo and roughness maps in UV space (\ref{['sec:method_intrinsic_decomposition']}). The neural renderer (\ref{['sec:method_render']}) outputs our prediction, which is supervised with the reference image. Yellow represents learnable components and green non-learnable components.
  • Figure 2: Neural network architecture of our NormalNet model. Each rectangular block is a 2D partial convolution. The symbol "$\uparrow$" refers to depth-to-space transformation, where the spatial resolution is increased by a factor of $2$ in each dimension, and "$||$" refers to feature-wise concatenation.
  • Figure 3: The unbiased depth from \ref{['eq:unbiased_depth']} is projected into 3D coordinates, where we approximate the tangent surface at the intersecting point and obtain the normal vector. Normals are sampled from multiple viewpoints and aggregated in the UV map.
  • ...and 11 more figures