Table of Contents
Fetching ...

JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting

Zhoutao Sun, Xukun Shen, Yong Hu, Yuyou Zhong, Xueyang Zhou

TL;DR

This work introduces JGHand, a joint-driven, animatable hand avatar built on 3D Gaussian Splatting (3DGS) that achieves real-time, photorealistic rendering across arbitrary poses. A differentiable, zero-error skeleton transformation maps canonical hand Gaussians to any target pose and bone length, enabling accurate, pose-aware deformations via Linear Blend Skinning, while a depth-based shadow layer simulates finger self-occlusion in real time. Identity priors implemented through a trainable triplane feature and pose-aware offsets allow personalized hand appearance without relying on explicit morphable-model parameters. Comprehensive ablations and cross-dataset experiments show improved rendering quality and speed over state-of-the-art methods, with high potential for integration into pose estimation and interactive applications, albeit with texture-completeness requirements for training data.

Abstract

Since hands are the primary interface in daily interactions, modeling high-quality digital human hands and rendering realistic images is a critical research problem. Furthermore, considering the requirements of interactive and rendering applications, it is essential to achieve real-time rendering and driveability of the digital model without compromising rendering quality. Thus, we propose Jointly 3D Gaussian Hand (JGHand), a novel joint-driven 3D Gaussian Splatting (3DGS)-based hand representation that renders high-fidelity hand images in real-time for various poses and characters. Distinct from existing articulated neural rendering techniques, we introduce a differentiable process for spatial transformations based on 3D key points. This process supports deformations from the canonical template to a mesh with arbitrary bone lengths and poses. Additionally, we propose a real-time shadow simulation method based on per-pixel depth to simulate self-occlusion shadows caused by finger movements. Finally, we embed the hand prior and propose an animatable 3DGS representation of the hand driven solely by 3D key points. We validate the effectiveness of each component of our approach through comprehensive ablation studies. Experimental results on public datasets demonstrate that JGHand achieves real-time rendering speeds with enhanced quality, surpassing state-of-the-art methods.

JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting

TL;DR

This work introduces JGHand, a joint-driven, animatable hand avatar built on 3D Gaussian Splatting (3DGS) that achieves real-time, photorealistic rendering across arbitrary poses. A differentiable, zero-error skeleton transformation maps canonical hand Gaussians to any target pose and bone length, enabling accurate, pose-aware deformations via Linear Blend Skinning, while a depth-based shadow layer simulates finger self-occlusion in real time. Identity priors implemented through a trainable triplane feature and pose-aware offsets allow personalized hand appearance without relying on explicit morphable-model parameters. Comprehensive ablations and cross-dataset experiments show improved rendering quality and speed over state-of-the-art methods, with high potential for integration into pose estimation and interactive applications, albeit with texture-completeness requirements for training data.

Abstract

Since hands are the primary interface in daily interactions, modeling high-quality digital human hands and rendering realistic images is a critical research problem. Furthermore, considering the requirements of interactive and rendering applications, it is essential to achieve real-time rendering and driveability of the digital model without compromising rendering quality. Thus, we propose Jointly 3D Gaussian Hand (JGHand), a novel joint-driven 3D Gaussian Splatting (3DGS)-based hand representation that renders high-fidelity hand images in real-time for various poses and characters. Distinct from existing articulated neural rendering techniques, we introduce a differentiable process for spatial transformations based on 3D key points. This process supports deformations from the canonical template to a mesh with arbitrary bone lengths and poses. Additionally, we propose a real-time shadow simulation method based on per-pixel depth to simulate self-occlusion shadows caused by finger movements. Finally, we embed the hand prior and propose an animatable 3DGS representation of the hand driven solely by 3D key points. We validate the effectiveness of each component of our approach through comprehensive ablation studies. Experimental results on public datasets demonstrate that JGHand achieves real-time rendering speeds with enhanced quality, surpassing state-of-the-art methods.

Paper Structure

This paper contains 15 sections, 13 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: We present JGHand, an animatable 3DGS-based hand model driven solely by keypoints. (a) Given 3D position of hand joints, we propose a transformation that converts the canonical pose to the input pose with zero error. (b) We propose a 3DGS-based framework that reconstructs the personalized hand appearance and achieve real-time, photorealistic rendering.
  • Figure 2: An overview of our proposed framework. Given a hand pose and a camera view from an RGB sequence, our method reconstructs an identity hand avatar and renders a photorealistic hand image in real-time. First, we compute the transformation based on the given hand pose. The estimation of the 3D Gaussian attributes is performed using the UVD coordinates of the canonical Gaussian. In this process, the position of the Gaussian is calculated using the transformation and the LBS algorithm. During the hand image rendering, the depth value of each pixel is computed, followed by the simulation of self-occluding shadows. The rendered image and simulated shadows are then superimposed to produce the final output.
  • Figure 3: (a) is an illustration of hand joints and levels, and the node with level 0 is the root joint. (b) illustrates the planes defined by the root joint and the level 1 joints. (c) indicates the local coordinate systems for each joint point on one finger. (d) shows the rotation angles of a joint in the local coordinate systems. $b_{xz}$ is the projection of the bone vector $b$ onto the $xz$ plane. The abduction angle $\theta_a$ is the angle between $b$ and $b_{xz}$, and the flexion angle $\theta_f$ is the angle between $b_{xz}$ and the coordinate axis $z$.
  • Figure 4: (a) represents the joint positions and mesh of MANO with the mean pose and shape parameter. (b) shows a sampling point located inside the canonical pose mesh, along with the nearest mesh face to it. The diagram includes four triangles that represent partial facets of the mesh. The upper red points indicate the sampling points, while the lower ones mark the projection points. The dark blue triangle highlights the facet where the projection points are located. (c) is an illustration of the uvd coordinates of the sampling point.
  • Figure 5: The top row of images demonstrates the varying shadow effects in the palm area resulting from finger movements, while the bottom row visualizes a pixel depth-based convolution kernel used to process these shadows. The red point in (b) marks a pixel for which a shadow mask will be calculated, the gray area delineates the region to be sampled, and black points identifies several specific sampling point within this region. (c) presents a side view of a mesh that maintains the same pose as seen in (b), and it maps the points from (b) directly onto that mesh surface.
  • ...and 6 more figures