Table of Contents
Fetching ...

A comprehensive framework for occluded human pose estimation

Linhao Xu, Lin Zhao, Xinxin Sun, Di Wang, Guangyu Li, Kedong Yan

TL;DR

Occlusion poses a major barrier to accurate human pose estimation due to limited occluded data, feature confusion between target and non-target individuals, and loss of contextual cues. The authors introduce a comprehensive DAG framework composed of Mask Joints with Instance Paste data augmentation, Adaptive Discriminative Attention Module (ADAM), and a Feature-Guided Multi-Hop GCN (FGMP-GCN) to tackle these challenges via data diversification, discriminative feature processing, and structure-guided refinement. Data augmentation simulates realistic occlusion, ADAM reinforces target-centric features, and FGMP-GCN exploits body priors and multi-hop relations to recover occluded joints, yielding robust improvements across benchmarks. Experiments on MSCOCO-RE, CrowdPose, and OCHuman show consistent gains over strong baselines, and the work emphasizes strong generalizability and practical applicability with plans to release code and data.

Abstract

Occlusion presents a significant challenge in human pose estimation. The challenges posed by occlusion can be attributed to the following factors: 1) Data: The collection and annotation of occluded human pose samples are relatively challenging. 2) Feature: Occlusion can cause feature confusion due to the high similarity between the target person and interfering individuals. 3) Inference: Robust inference becomes challenging due to the loss of complete body structural information. The existing methods designed for occluded human pose estimation usually focus on addressing only one of these factors. In this paper, we propose a comprehensive framework DAG (Data, Attention, Graph) to address the performance degradation caused by occlusion. Specifically, we introduce the mask joints with instance paste data augmentation technique to simulate occlusion scenarios. Additionally, an Adaptive Discriminative Attention Module (ADAM) is proposed to effectively enhance the features of target individuals. Furthermore, we present the Feature-Guided Multi-Hop GCN (FGMP-GCN) to fully explore the prior knowledge of body structure and improve pose estimation results. Through extensive experiments conducted on three benchmark datasets for occluded human pose estimation, we demonstrate that the proposed method outperforms existing methods. Code and data will be publicly available.

A comprehensive framework for occluded human pose estimation

TL;DR

Occlusion poses a major barrier to accurate human pose estimation due to limited occluded data, feature confusion between target and non-target individuals, and loss of contextual cues. The authors introduce a comprehensive DAG framework composed of Mask Joints with Instance Paste data augmentation, Adaptive Discriminative Attention Module (ADAM), and a Feature-Guided Multi-Hop GCN (FGMP-GCN) to tackle these challenges via data diversification, discriminative feature processing, and structure-guided refinement. Data augmentation simulates realistic occlusion, ADAM reinforces target-centric features, and FGMP-GCN exploits body priors and multi-hop relations to recover occluded joints, yielding robust improvements across benchmarks. Experiments on MSCOCO-RE, CrowdPose, and OCHuman show consistent gains over strong baselines, and the work emphasizes strong generalizability and practical applicability with plans to release code and data.

Abstract

Occlusion presents a significant challenge in human pose estimation. The challenges posed by occlusion can be attributed to the following factors: 1) Data: The collection and annotation of occluded human pose samples are relatively challenging. 2) Feature: Occlusion can cause feature confusion due to the high similarity between the target person and interfering individuals. 3) Inference: Robust inference becomes challenging due to the loss of complete body structural information. The existing methods designed for occluded human pose estimation usually focus on addressing only one of these factors. In this paper, we propose a comprehensive framework DAG (Data, Attention, Graph) to address the performance degradation caused by occlusion. Specifically, we introduce the mask joints with instance paste data augmentation technique to simulate occlusion scenarios. Additionally, an Adaptive Discriminative Attention Module (ADAM) is proposed to effectively enhance the features of target individuals. Furthermore, we present the Feature-Guided Multi-Hop GCN (FGMP-GCN) to fully explore the prior knowledge of body structure and improve pose estimation results. Through extensive experiments conducted on three benchmark datasets for occluded human pose estimation, we demonstrate that the proposed method outperforms existing methods. Code and data will be publicly available.
Paper Structure (14 sections, 4 equations, 2 figures, 2 tables)

This paper contains 14 sections, 4 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Challenges existed in occluded pose estimation. (a)Keypoints Swap: Due to the interference of the non-target person, some keypoints are predicted in the wrong positions. (b)Kypoints Miss: The large area of occlusion leads the contextual and structural information loss. Left is ground truth results and right is detection results.
  • Figure 2: The framework of our proposed method DAG. The input image undergoes data augmentation, then it is fed into the backbone for feature extraction. Subsequently, the features are input into the adaptive discriminative attention module for feature enhancement, the enhanced features are generated for initial pose generation. The initial pose is then sent to the feature-guided multi-hop GCN for pose refinement and correction, producing the final pose.