OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, Yaser Sheikh
TL;DR
The paper tackles realtime multi-person 2D pose estimation by introducing Part Affinity Fields (PAFs), a bottom-up, nonparametric representation that encodes limb location and orientation to efficiently associate body parts across many people. A multi-stage CNN jointly predicts PAFs and body-part confidence maps, with a greedy parsing algorithm assembling poses without heavy global optimization. OpenPose, the open-source system resulting from this work, achieves real-time performance across body, foot, hand, and facial keypoints and demonstrates strong results on MPII and COCO, while enabling broad applicability and multi-view 3D extension. The authors also release a dedicated foot keypoint dataset and show that combining body and foot detectors maintains accuracy and speeds up inference, illustrating practical impact for real-time human analysis tasks.
Abstract
Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. This bottom-up system achieves high accuracy and realtime performance, regardless of the number of people in the image. In previous work, PAFs and body part location estimation were refined simultaneously across training stages. We demonstrate that a PAF-only refinement rather than both PAF and body part location refinement results in a substantial increase in both runtime performance and accuracy. We also present the first combined body and foot keypoint detector, based on an internal annotated foot dataset that we have publicly released. We show that the combined detector not only reduces the inference time compared to running them sequentially, but also maintains the accuracy of each component individually. This work has culminated in the release of OpenPose, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints.
