Learning Localization of Body and Finger Animation Skeleton Joints on Three-Dimensional Models of Human Bodies
Stefan Novaković, Vladimir Risojević
TL;DR
The paper addresses localizing body and finger animation joints on 3D human models from point clouds when real annotated scans are scarce. It introduces a DGCNN-based backbone that predicts per-joint convex-coefficient maps $A$ so that joint positions follow the convex combination $\hat{\mathbf{j}}_k = \sum_{i=1}^N A_{i,k}\mathbf{p}_i$, with $A_{i,k}\ge0$ and $\sum_i A_{i,k}=1$, and trains with the loss $L = \sum_{k=1}^J \|\mathbf{j}_k - \hat{\mathbf{j}}_k\|_2^2$. Key contributions include generating a large synthetic dataset via MakeHuman with varied shape/pose and leaf-bone corrections, a compact architecture requiring minimal preprocessing, and state-of-the-art accuracy—especially for finger joints—along with faster inference (~1.5s per model). The work has practical impact for automated rigging and can be extended to real scans through pretraining and pose-augmentation strategies.
Abstract
Contemporary approaches to solving various problems that require analyzing three-dimensional (3D) meshes and point clouds have adopted the use of deep learning algorithms that directly process 3D data such as point coordinates, normal vectors and vertex connectivity information. Our work proposes one such solution to the problem of positioning body and finger animation skeleton joints within 3D models of human bodies. Due to scarcity of annotated real human scans, we resort to generating synthetic samples while varying their shape and pose parameters. Similarly to the state-of-the-art approach, our method computes each joint location as a convex combination of input points. Given only a list of point coordinates and normal vector estimates as input, a dynamic graph convolutional neural network is used to predict the coefficients of the convex combinations. By comparing our method with the state-of-the-art, we show that it is possible to achieve significantly better results with a simpler architecture, especially for finger joints. Since our solution requires fewer precomputed features, it also allows for shorter processing times.
