Learning Localization of Body and Finger Animation Skeleton Joints on Three-Dimensional Models of Human Bodies

Stefan Novaković; Vladimir Risojević

Learning Localization of Body and Finger Animation Skeleton Joints on Three-Dimensional Models of Human Bodies

Stefan Novaković, Vladimir Risojević

TL;DR

The paper addresses localizing body and finger animation joints on 3D human models from point clouds when real annotated scans are scarce. It introduces a DGCNN-based backbone that predicts per-joint convex-coefficient maps $A$ so that joint positions follow the convex combination $\hat{\mathbf{j}}_k = \sum_{i=1}^N A_{i,k}\mathbf{p}_i$, with $A_{i,k}\ge0$ and $\sum_i A_{i,k}=1$, and trains with the loss $L = \sum_{k=1}^J \|\mathbf{j}_k - \hat{\mathbf{j}}_k\|_2^2$. Key contributions include generating a large synthetic dataset via MakeHuman with varied shape/pose and leaf-bone corrections, a compact architecture requiring minimal preprocessing, and state-of-the-art accuracy—especially for finger joints—along with faster inference (~1.5s per model). The work has practical impact for automated rigging and can be extended to real scans through pretraining and pose-augmentation strategies.

Abstract

Contemporary approaches to solving various problems that require analyzing three-dimensional (3D) meshes and point clouds have adopted the use of deep learning algorithms that directly process 3D data such as point coordinates, normal vectors and vertex connectivity information. Our work proposes one such solution to the problem of positioning body and finger animation skeleton joints within 3D models of human bodies. Due to scarcity of annotated real human scans, we resort to generating synthetic samples while varying their shape and pose parameters. Similarly to the state-of-the-art approach, our method computes each joint location as a convex combination of input points. Given only a list of point coordinates and normal vector estimates as input, a dynamic graph convolutional neural network is used to predict the coefficients of the convex combinations. By comparing our method with the state-of-the-art, we show that it is possible to achieve significantly better results with a simpler architecture, especially for finger joints. Since our solution requires fewer precomputed features, it also allows for shorter processing times.

Learning Localization of Body and Finger Animation Skeleton Joints on Three-Dimensional Models of Human Bodies

TL;DR

so that joint positions follow the convex combination

, with

and

, and trains with the loss

. Key contributions include generating a large synthetic dataset via MakeHuman with varied shape/pose and leaf-bone corrections, a compact architecture requiring minimal preprocessing, and state-of-the-art accuracy—especially for finger joints—along with faster inference (~1.5s per model). The work has practical impact for automated rigging and can be extended to real scans through pretraining and pose-augmentation strategies.

Abstract

Paper Structure (14 sections, 3 equations, 5 figures, 1 table)

This paper contains 14 sections, 3 equations, 5 figures, 1 table.

Introduction
Related work
Point cloud analysis
Neural rigging
Architecture
Dataset
Synthetic data
Leaf bone length correction
Pose randomization
Remeshing and normalization
Experimental results
Training
Evaluation
Conclusion and further work

Figures (5)

Figure 1: Visualization of input point cloud overlaid with output joint estimations connected into a skeletal hierarchy (colored green), rendered using Blender Blender viewport rendering.
Figure 2: Our proposed architecture based on the DGCNN backbone as provided in the published code of PointNeXt qian2022pointnext. As input for the DGCNN-based neural network, we provide the 3D point coordinates and their estimated normal vectors. The first EdgeConv layer computes edges based on $k$-nearest neighbour distances between points in 3D space. Subsequent EdgeConv layers compute edges based on distances in feature space. The output is a list of estimated joint positions. For each joint, its coordinates are computed as a convex combination of the input points. We employ a per-joint softmax layer to guarantee that, for each joint, the convex combination coefficients are non-negative and sum to $1$.
Figure 3: Results of evaluation on the test set by the PCJ metric. Our method achieves significant improvement in both body and finger joint localization.
Figure 4: Comparison of body joint visualizations, rendered using Blender viewport rendering: (a) groundtruth, (b) TARig--TJM and (c) our method. Our method shows slight improvements which are noticeable on the joints of the spine and the right knee joint.
Figure 5: Comparison of finger joint visualizations, rendered using Blender viewport rendering: (a) groundtruth, (b) TARig--TJM estimations and (c) estimations by our method, which show considerable improvements compared to TARig--TJM estimations. TARig--TJM places the little finger joints the ring finger, while the ring finger joints are outside of the mesh. The estimations by our method are devoid of such significant issues.

Learning Localization of Body and Finger Animation Skeleton Joints on Three-Dimensional Models of Human Bodies

TL;DR

Abstract

Learning Localization of Body and Finger Animation Skeleton Joints on Three-Dimensional Models of Human Bodies

Authors

TL;DR

Abstract

Table of Contents

Figures (5)