A Unified Framework for Human-centric Point Cloud Video Understanding
Yiteng Xu, Kecheng Ye, Xiao Han, Yiming Ren, Xinge Zhu, Yuexin Ma
TL;DR
This work addresses the generalization limitations of existing human-centric PVU methods by proposing UniPVU-Human, a unified framework that exploits human priors (global, part, and point-level) and a self-supervised semantic-guided spatio-temporal representation learning pipeline. It introduces two synthetic priors (HBSeg for body-part segmentation and HMFlow for motion flow) and a self-learning stage that masks body-part patches to learn geometry and dynamics without annotations, followed by hierarchical fine-tuning that fuses global, part, and motion-aware features for downstream tasks. Empirical results on HuCenLife (action recognition) and LIP (3D pose estimation) achieve state-of-the-art performance, with ablations validating the contribution of each module and showing strong semi-supervised robustness. The framework also provides two synthetic datasets (LiDARFlow-Human and LiDARPart-Human) to support future research and demonstrates that human-centric priors significantly improve transferability across diverse PVU tasks.
Abstract
Human-centric Point Cloud Video Understanding (PVU) is an emerging field focused on extracting and interpreting human-related features from sequences of human point clouds, further advancing downstream human-centric tasks and applications. Previous works usually focus on tackling one specific task and rely on huge labeled data, which has poor generalization capability. Considering that human has specific characteristics, including the structural semantics of human body and the dynamics of human motions, we propose a unified framework to make full use of the prior knowledge and explore the inherent features in the data itself for generalized human-centric point cloud video understanding. Extensive experiments demonstrate that our method achieves state-of-the-art performance on various human-related tasks, including action recognition and 3D pose estimation. All datasets and code will be released soon.
