Occluded Human Pose Estimation based on Limb Joint Augmentation
Gangtao Han, Chunxiao Song, Song Wang, Hao Wang, Enqing Chen, Guanghui Wang
TL;DR
The paper tackles occluded human pose estimation by introducing Limb Joint Augmentation (LJA) to simulate occlusions during training and a Dynamic Structure Loss (DSL) that leverages limb graphs to enforce dependencies among adjacent joints. LJA creates occlusion blocks around a subset of visible joints, with $v=\lceil\alpha V\rceil$ and block size $[h_o,w_o]=\beta [h,w]$, while the total loss is $L_{DSL}=L_{MSE}+\lambda L_{LSL}$, with $L_{LSL}$ computed on two limb graphs and a dynamic schedule for $\lambda$ to stabilize training. Experiments on OCHuman and CrowdPose demonstrate consistent improvements across backbones without increasing inference cost, and ablation confirms the complementary contributions of LJA and DSL, with a step-based scheduling performing best. The approach offers practical gains for real-world occlusion scenarios, improving robustness without adding computational burden at inference time.
Abstract
Human pose estimation aims at locating the specific joints of humans from the images or videos. While existing deep learning-based methods have achieved high positioning accuracy, they often struggle with generalization in occlusion scenarios. In this paper, we propose an occluded human pose estimation framework based on limb joint augmentation to enhance the generalization ability of the pose estimation model on the occluded human bodies. Specifically, the occlusion blocks are at first employed to randomly cover the limb joints of the human bodies from the training images, imitating the scene where the objects or other people partially occlude the human body. Trained by the augmented samples, the pose estimation model is encouraged to accurately locate the occluded keypoints based on the visible ones. To further enhance the localization ability of the model, this paper constructs a dynamic structure loss function based on limb graphs to explore the distribution of occluded joints by evaluating the dependence between adjacent joints. Extensive experimental evaluations on two occluded datasets, OCHuman and CrowdPose, demonstrate significant performance improvements without additional computation cost during inference.
