A comprehensive framework for occluded human pose estimation
Linhao Xu, Lin Zhao, Xinxin Sun, Di Wang, Guangyu Li, Kedong Yan
TL;DR
Occlusion poses a major barrier to accurate human pose estimation due to limited occluded data, feature confusion between target and non-target individuals, and loss of contextual cues. The authors introduce a comprehensive DAG framework composed of Mask Joints with Instance Paste data augmentation, Adaptive Discriminative Attention Module (ADAM), and a Feature-Guided Multi-Hop GCN (FGMP-GCN) to tackle these challenges via data diversification, discriminative feature processing, and structure-guided refinement. Data augmentation simulates realistic occlusion, ADAM reinforces target-centric features, and FGMP-GCN exploits body priors and multi-hop relations to recover occluded joints, yielding robust improvements across benchmarks. Experiments on MSCOCO-RE, CrowdPose, and OCHuman show consistent gains over strong baselines, and the work emphasizes strong generalizability and practical applicability with plans to release code and data.
Abstract
Occlusion presents a significant challenge in human pose estimation. The challenges posed by occlusion can be attributed to the following factors: 1) Data: The collection and annotation of occluded human pose samples are relatively challenging. 2) Feature: Occlusion can cause feature confusion due to the high similarity between the target person and interfering individuals. 3) Inference: Robust inference becomes challenging due to the loss of complete body structural information. The existing methods designed for occluded human pose estimation usually focus on addressing only one of these factors. In this paper, we propose a comprehensive framework DAG (Data, Attention, Graph) to address the performance degradation caused by occlusion. Specifically, we introduce the mask joints with instance paste data augmentation technique to simulate occlusion scenarios. Additionally, an Adaptive Discriminative Attention Module (ADAM) is proposed to effectively enhance the features of target individuals. Furthermore, we present the Feature-Guided Multi-Hop GCN (FGMP-GCN) to fully explore the prior knowledge of body structure and improve pose estimation results. Through extensive experiments conducted on three benchmark datasets for occluded human pose estimation, we demonstrate that the proposed method outperforms existing methods. Code and data will be publicly available.
