DeepPose: Human Pose Estimation via Deep Neural Networks
Alexander Toshev, Christian Szegedy
TL;DR
DeepPose reframes human pose estimation as a regression problem solved by a convolutional neural network and compounds it with a cascade of refiners to achieve high-precision joint localization. By regressing from the full image and subsequently focusing on higher-resolution sub-images, the method captures global context while enabling fine-grained refinement. The approach achieves state-of-the-art or better results on FLIC and LSP and demonstrates strong cross-dataset generalization to related datasets, highlighting robust pose representations learned by a generic CNN. This work demonstrates that end-to-end DNN-based pose estimation can rival traditional part-based models while offering simplicity and scalability, with cascade refinement enhancing precision where it matters most.
Abstract
We propose a method for human pose estimation based on Deep Neural Networks (DNNs). The pose estimation is formulated as a DNN-based regression problem towards body joints. We present a cascade of such DNN regressors which results in high precision pose estimates. The approach has the advantage of reasoning about pose in a holistic fashion and has a simple but yet powerful formulation which capitalizes on recent advances in Deep Learning. We present a detailed empirical analysis with state-of-art or better performance on four academic benchmarks of diverse real-world images.
