Table of Contents
Fetching ...

NECA: Neural Customizable Human Avatar

Junjin Xiao, Qing Zhang, Zhan Xu, Wei-Shi Zheng

TL;DR

This work introduces NECA, an approach capable of learning versatile human representation from monocular or sparse-view videos, enabling granular customization across aspects such as pose, shadow, shape, lighting and texture.

Abstract

Human avatar has become a novel type of 3D asset with various applications. Ideally, a human avatar should be fully customizable to accommodate different settings and environments. In this work, we introduce NECA, an approach capable of learning versatile human representation from monocular or sparse-view videos, enabling granular customization across aspects such as pose, shadow, shape, lighting and texture. The core of our approach is to represent humans in complementary dual spaces and predict disentangled neural fields of geometry, albedo, shadow, as well as an external lighting, from which we are able to derive realistic rendering with high-frequency details via volumetric rendering. Extensive experiments demonstrate the advantage of our method over the state-of-the-art methods in photorealistic rendering, as well as various editing tasks such as novel pose synthesis and relighting. The code is available at https://github.com/iSEE-Laboratory/NECA.

NECA: Neural Customizable Human Avatar

TL;DR

This work introduces NECA, an approach capable of learning versatile human representation from monocular or sparse-view videos, enabling granular customization across aspects such as pose, shadow, shape, lighting and texture.

Abstract

Human avatar has become a novel type of 3D asset with various applications. Ideally, a human avatar should be fully customizable to accommodate different settings and environments. In this work, we introduce NECA, an approach capable of learning versatile human representation from monocular or sparse-view videos, enabling granular customization across aspects such as pose, shadow, shape, lighting and texture. The core of our approach is to represent humans in complementary dual spaces and predict disentangled neural fields of geometry, albedo, shadow, as well as an external lighting, from which we are able to derive realistic rendering with high-frequency details via volumetric rendering. Extensive experiments demonstrate the advantage of our method over the state-of-the-art methods in photorealistic rendering, as well as various editing tasks such as novel pose synthesis and relighting. The code is available at https://github.com/iSEE-Laboratory/NECA.
Paper Structure (25 sections, 25 equations, 22 figures, 8 tables)

This paper contains 25 sections, 25 equations, 22 figures, 8 tables.

Figures (22)

  • Figure 1: Neural customizable human avatar. Our method takes as input monocular or sparse multi-view videos and outputs disentangled human representations, including normal, albedo, shadow, and illumination. Such disentanglement enables control of the learned human avatar with arbitrary poses/viewpoints and various customization options such as adjusting shape, shadow, lighting and texture.
  • Figure 2: Overview of NECA. We first sample points along the camera ray and transform the query points from observation space to canonical space. Next, we query the pose-aware feature by projecting points to factorized tri-plane that per-pose optimized. Then we construct tangent space of the nearest surface point to the query points on SMPL, and obtain the subject-level feature by concatenating the tangent space local coordinate and the learned latent code in the surface space. Finally, to enable flexible customization, we disentangle the neural fields into attributes including SDF, albedo and shadow, as well as a learnable environmental lighting, by decoding the extracted features with distinct MLPs. The entire network is trained in a self-supervised manner, with only photometric losses and normal regularization.
  • Figure 3: Local coordinate used by zhang2022ndfho2023customliu2021neural and the one used by us. (a) Previous works defined their local coordinate in UV space. The three pink points will be mapped to the same red point on the surface, thus share the same feature. Such many-to-one problem is also mentioned in Xu_2022_CVPR. (b) We complement UV space with local tangent space. Although pink points are still mapped to the same surface point, now they have different coordinates in the local tangent space. Their features are therefore different. As previous works, our local coordinate is also rotation invariant.
  • Figure 4: Qualitative comparison of novel pose synthesis on ZJU-MoCap dataset.
  • Figure 5: Qualitative comparison of relighting under novel pose and view on ZJU-MoCap dataset. Given the original frame as reference, we compare with Relighting4D on the estimated normal and albedo, as well as the generated relighting results. As shown, our method outperforms Relighting4D in both appearance disentanglement and relighting. Please see the supplementary material for more results.
  • ...and 17 more figures