Table of Contents
Fetching ...

Occluded Human Pose Estimation based on Limb Joint Augmentation

Gangtao Han, Chunxiao Song, Song Wang, Hao Wang, Enqing Chen, Guanghui Wang

TL;DR

The paper tackles occluded human pose estimation by introducing Limb Joint Augmentation (LJA) to simulate occlusions during training and a Dynamic Structure Loss (DSL) that leverages limb graphs to enforce dependencies among adjacent joints. LJA creates occlusion blocks around a subset of visible joints, with $v=\lceil\alpha V\rceil$ and block size $[h_o,w_o]=\beta [h,w]$, while the total loss is $L_{DSL}=L_{MSE}+\lambda L_{LSL}$, with $L_{LSL}$ computed on two limb graphs and a dynamic schedule for $\lambda$ to stabilize training. Experiments on OCHuman and CrowdPose demonstrate consistent improvements across backbones without increasing inference cost, and ablation confirms the complementary contributions of LJA and DSL, with a step-based scheduling performing best. The approach offers practical gains for real-world occlusion scenarios, improving robustness without adding computational burden at inference time.

Abstract

Human pose estimation aims at locating the specific joints of humans from the images or videos. While existing deep learning-based methods have achieved high positioning accuracy, they often struggle with generalization in occlusion scenarios. In this paper, we propose an occluded human pose estimation framework based on limb joint augmentation to enhance the generalization ability of the pose estimation model on the occluded human bodies. Specifically, the occlusion blocks are at first employed to randomly cover the limb joints of the human bodies from the training images, imitating the scene where the objects or other people partially occlude the human body. Trained by the augmented samples, the pose estimation model is encouraged to accurately locate the occluded keypoints based on the visible ones. To further enhance the localization ability of the model, this paper constructs a dynamic structure loss function based on limb graphs to explore the distribution of occluded joints by evaluating the dependence between adjacent joints. Extensive experimental evaluations on two occluded datasets, OCHuman and CrowdPose, demonstrate significant performance improvements without additional computation cost during inference.

Occluded Human Pose Estimation based on Limb Joint Augmentation

TL;DR

The paper tackles occluded human pose estimation by introducing Limb Joint Augmentation (LJA) to simulate occlusions during training and a Dynamic Structure Loss (DSL) that leverages limb graphs to enforce dependencies among adjacent joints. LJA creates occlusion blocks around a subset of visible joints, with and block size , while the total loss is , with computed on two limb graphs and a dynamic schedule for to stabilize training. Experiments on OCHuman and CrowdPose demonstrate consistent improvements across backbones without increasing inference cost, and ablation confirms the complementary contributions of LJA and DSL, with a step-based scheduling performing best. The approach offers practical gains for real-world occlusion scenarios, improving robustness without adding computational burden at inference time.

Abstract

Human pose estimation aims at locating the specific joints of humans from the images or videos. While existing deep learning-based methods have achieved high positioning accuracy, they often struggle with generalization in occlusion scenarios. In this paper, we propose an occluded human pose estimation framework based on limb joint augmentation to enhance the generalization ability of the pose estimation model on the occluded human bodies. Specifically, the occlusion blocks are at first employed to randomly cover the limb joints of the human bodies from the training images, imitating the scene where the objects or other people partially occlude the human body. Trained by the augmented samples, the pose estimation model is encouraged to accurately locate the occluded keypoints based on the visible ones. To further enhance the localization ability of the model, this paper constructs a dynamic structure loss function based on limb graphs to explore the distribution of occluded joints by evaluating the dependence between adjacent joints. Extensive experimental evaluations on two occluded datasets, OCHuman and CrowdPose, demonstrate significant performance improvements without additional computation cost during inference.

Paper Structure

This paper contains 16 sections, 6 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: The overall framework. This paper proposes to simulate the real occlusion scene by generating occlusion blocks over the limb joints. The pose estimate network takes the occluded human frames as input and predicts the positions of all the joints. The dynamic structure loss explores the dependence between adjacent joints on limbs. With the augmented training samples and the constraint of human structure, the network generates accurate prediction results for the occluded joints.
  • Figure 2: The visualization of limb joint augmentation (LJA). (a) Original images. (b) Occluded images generated by LJA. In the experiments, we set ${\alpha=0.15}$, and ${\beta=0.20}$. The values of the occlusion area in the above images are set to 169 for clear visualization.
  • Figure 3: The visualization of human limb graphs with joints. This paper constructs two separate limb graphs: left wrist-left elbow-left shoulder-right shoulder-right elbow-right wrist, and left ankle-left knee-left hip-right hip-right knee-right ankle.
  • Figure 4: Visualization of pose estimation results after correction by dynamic limb structure loss.
  • Figure 5: The options of the weighting scheme, where ${\lambda}$ changes along with the increasing epoch.
  • ...and 1 more figures