Table of Contents
Fetching ...

Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars

Xuan Huang, Hanhui Li, Wanquan Liu, Xiaodan Liang, Yiqiang Yan, Yuhao Cheng, Chengqiang Gao

TL;DR

This paper introduces a novel two-stage interaction-aware GS framework that exploits cross-subject hand priors and refines 3D Gaussians in interacting areas, and devise an interaction-aware attention module and a self-adaptive Gaussian refinement module that enhance image rendering quality in areas with intra- and inter-hand interactions.

Abstract

In this paper, we propose to create animatable avatars for interacting hands with 3D Gaussian Splatting (GS) and single-image inputs. Existing GS-based methods designed for single subjects often yield unsatisfactory results due to limited input views, various hand poses, and occlusions. To address these challenges, we introduce a novel two-stage interaction-aware GS framework that exploits cross-subject hand priors and refines 3D Gaussians in interacting areas. Particularly, to handle hand variations, we disentangle the 3D presentation of hands into optimization-based identity maps and learning-based latent geometric features and neural texture maps. Learning-based features are captured by trained networks to provide reliable priors for poses, shapes, and textures, while optimization-based identity maps enable efficient one-shot fitting of out-of-distribution hands. Furthermore, we devise an interaction-aware attention module and a self-adaptive Gaussian refinement module. These modules enhance image rendering quality in areas with intra- and inter-hand interactions, overcoming the limitations of existing GS-based methods. Our proposed method is validated via extensive experiments on the large-scale InterHand2.6M dataset, and it significantly improves the state-of-the-art performance in image quality. Project Page: \url{https://github.com/XuanHuang0/GuassianHand}.

Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars

TL;DR

This paper introduces a novel two-stage interaction-aware GS framework that exploits cross-subject hand priors and refines 3D Gaussians in interacting areas, and devise an interaction-aware attention module and a self-adaptive Gaussian refinement module that enhance image rendering quality in areas with intra- and inter-hand interactions.

Abstract

In this paper, we propose to create animatable avatars for interacting hands with 3D Gaussian Splatting (GS) and single-image inputs. Existing GS-based methods designed for single subjects often yield unsatisfactory results due to limited input views, various hand poses, and occlusions. To address these challenges, we introduce a novel two-stage interaction-aware GS framework that exploits cross-subject hand priors and refines 3D Gaussians in interacting areas. Particularly, to handle hand variations, we disentangle the 3D presentation of hands into optimization-based identity maps and learning-based latent geometric features and neural texture maps. Learning-based features are captured by trained networks to provide reliable priors for poses, shapes, and textures, while optimization-based identity maps enable efficient one-shot fitting of out-of-distribution hands. Furthermore, we devise an interaction-aware attention module and a self-adaptive Gaussian refinement module. These modules enhance image rendering quality in areas with intra- and inter-hand interactions, overcoming the limitations of existing GS-based methods. Our proposed method is validated via extensive experiments on the large-scale InterHand2.6M dataset, and it significantly improves the state-of-the-art performance in image quality. Project Page: \url{https://github.com/XuanHuang0/GuassianHand}.

Paper Structure

This paper contains 26 sections, 5 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: We present a novel interaction-aware Gaussian splatting framework that creates animatable interacting hand avatars from a single image. These high-fidelity avatars support various applications, such as editing, animation, combination, duplication, re-scaling, and text-to-avatar conversion.
  • Figure 2: Paradigm comparison between existing one-shot hand avatar methods (a-c) and the proposed method (d). By decoupling the learning and fitting stages, our method leverages the advantages of learning-based methods (a, b) in modeling cross-subject hand priors, and the advantages of inversion-based methods (c) in one-shot fitting without the extra cost of network fine-tuning.
  • Figure 3: The architecture of the proposed interaction-aware Gaussian splatting network, of which the core components are the disentangled hand representation, the interaction detection module, the interaction-aware attention module, and the Gaussian refinement module labeled in green.
  • Figure 4: Qualitative comparisons with state-of-the-art methods. The input image is shown in the top-left grid labeled in red. The first row presents results without changing the pose from the input view (left) and an alternative view (right), while results in the remaining rows are with novel poses.
  • Figure 5: Visual examples of the ablation study on the proposed components in the hand-prior learning stage (top) and the one-shot fitting stage (bottom).
  • ...and 5 more figures