Table of Contents
Fetching ...

3D Visibility-aware Generalizable Neural Radiance Fields for Interacting Hands

Xuan Huang, Hanhui Li, Zejun Yang, Zhisheng Wang, Xiaodan Liang

TL;DR

VA-NeRF addresses the challenge of single-view, generalizable NeRFs for interacting hands by introducing visibility-aware feature fusion and a visibility-guided adversarial loss. The method uses MANO-based hand meshes, dual encoders for geometry and texture, a deviant SDF for density, and a pixel-wise visibility discriminator to supervise unseen regions. Formulated as $f:(q,d,I) \to (c,\sigma)$ with $q\in\mathbb{R}^3$, $d\in\mathbb{R}^3$, the model fuses local and global features while leveraging symmetry between hands. Evaluations on Interhand2.6M show state-of-the-art gains in PSNR, SSIM, and LPIPS and demonstrate robustness to occlusions and large view variations.

Abstract

Neural radiance fields (NeRFs) are promising 3D representations for scenes, objects, and humans. However, most existing methods require multi-view inputs and per-scene training, which limits their real-life applications. Moreover, current methods focus on single-subject cases, leaving scenes of interacting hands that involve severe inter-hand occlusions and challenging view variations remain unsolved. To tackle these issues, this paper proposes a generalizable visibility-aware NeRF (VA-NeRF) framework for interacting hands. Specifically, given an image of interacting hands as input, our VA-NeRF first obtains a mesh-based representation of hands and extracts their corresponding geometric and textural features. Subsequently, a feature fusion module that exploits the visibility of query points and mesh vertices is introduced to adaptively merge features of both hands, enabling the recovery of features in unseen areas. Additionally, our VA-NeRF is optimized together with a novel discriminator within an adversarial learning paradigm. In contrast to conventional discriminators that predict a single real/fake label for the synthesized image, the proposed discriminator generates a pixel-wise visibility map, providing fine-grained supervision for unseen areas and encouraging the VA-NeRF to improve the visual quality of synthesized images. Experiments on the Interhand2.6M dataset demonstrate that our proposed VA-NeRF outperforms conventional NeRFs significantly. Project Page: \url{https://github.com/XuanHuang0/VANeRF}.

3D Visibility-aware Generalizable Neural Radiance Fields for Interacting Hands

TL;DR

VA-NeRF addresses the challenge of single-view, generalizable NeRFs for interacting hands by introducing visibility-aware feature fusion and a visibility-guided adversarial loss. The method uses MANO-based hand meshes, dual encoders for geometry and texture, a deviant SDF for density, and a pixel-wise visibility discriminator to supervise unseen regions. Formulated as with , , the model fuses local and global features while leveraging symmetry between hands. Evaluations on Interhand2.6M show state-of-the-art gains in PSNR, SSIM, and LPIPS and demonstrate robustness to occlusions and large view variations.

Abstract

Neural radiance fields (NeRFs) are promising 3D representations for scenes, objects, and humans. However, most existing methods require multi-view inputs and per-scene training, which limits their real-life applications. Moreover, current methods focus on single-subject cases, leaving scenes of interacting hands that involve severe inter-hand occlusions and challenging view variations remain unsolved. To tackle these issues, this paper proposes a generalizable visibility-aware NeRF (VA-NeRF) framework for interacting hands. Specifically, given an image of interacting hands as input, our VA-NeRF first obtains a mesh-based representation of hands and extracts their corresponding geometric and textural features. Subsequently, a feature fusion module that exploits the visibility of query points and mesh vertices is introduced to adaptively merge features of both hands, enabling the recovery of features in unseen areas. Additionally, our VA-NeRF is optimized together with a novel discriminator within an adversarial learning paradigm. In contrast to conventional discriminators that predict a single real/fake label for the synthesized image, the proposed discriminator generates a pixel-wise visibility map, providing fine-grained supervision for unseen areas and encouraging the VA-NeRF to improve the visual quality of synthesized images. Experiments on the Interhand2.6M dataset demonstrate that our proposed VA-NeRF outperforms conventional NeRFs significantly. Project Page: \url{https://github.com/XuanHuang0/VANeRF}.
Paper Structure (12 sections, 5 equations, 7 figures, 6 tables)

This paper contains 12 sections, 5 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Compared with previous generalizable NeRFs, our visibility-aware NeRF not only (a) generates images of better quality, but also tackles challenging tasks such as (b) inpainting obstructed areas and (c) removing hands in interacting scenes.
  • Figure 2: The framework of VA-NeRF. It consists of two key components and both of them are designed to leverage the visibility of 3D points. The first one is the visibility-aware feature fusion module that estimates appropriate features for query points, while the second one is the visibility-guided adversarial learning strategy that is used to enhance synthesized results.
  • Figure 3: Comparison between the traditional binary-class discriminator (left) and the proposed visibility-guided discriminator (right). Note that the visibility map is conditioned on the input view and whether the target image is real or synthesized.
  • Figure 4: Visual comparison of the proposed method against state-of-the-art methods. Results of the proposed method better preserve hand structures and textures.
  • Figure 5: Qualitative examples of novel-view rendering with large view variations (rotation angles $>$ 30 degrees).
  • ...and 2 more figures