Table of Contents
Fetching ...

DeepPose: Human Pose Estimation via Deep Neural Networks

Alexander Toshev, Christian Szegedy

TL;DR

DeepPose reframes human pose estimation as a regression problem solved by a convolutional neural network and compounds it with a cascade of refiners to achieve high-precision joint localization. By regressing from the full image and subsequently focusing on higher-resolution sub-images, the method captures global context while enabling fine-grained refinement. The approach achieves state-of-the-art or better results on FLIC and LSP and demonstrates strong cross-dataset generalization to related datasets, highlighting robust pose representations learned by a generic CNN. This work demonstrates that end-to-end DNN-based pose estimation can rival traditional part-based models while offering simplicity and scalability, with cascade refinement enhancing precision where it matters most.

Abstract

We propose a method for human pose estimation based on Deep Neural Networks (DNNs). The pose estimation is formulated as a DNN-based regression problem towards body joints. We present a cascade of such DNN regressors which results in high precision pose estimates. The approach has the advantage of reasoning about pose in a holistic fashion and has a simple but yet powerful formulation which capitalizes on recent advances in Deep Learning. We present a detailed empirical analysis with state-of-art or better performance on four academic benchmarks of diverse real-world images.

DeepPose: Human Pose Estimation via Deep Neural Networks

TL;DR

DeepPose reframes human pose estimation as a regression problem solved by a convolutional neural network and compounds it with a cascade of refiners to achieve high-precision joint localization. By regressing from the full image and subsequently focusing on higher-resolution sub-images, the method captures global context while enabling fine-grained refinement. The approach achieves state-of-the-art or better results on FLIC and LSP and demonstrates strong cross-dataset generalization to related datasets, highlighting robust pose representations learned by a generic CNN. This work demonstrates that end-to-end DNN-based pose estimation can rival traditional part-based models while offering simplicity and scalability, with cascade refinement enhancing precision where it matters most.

Abstract

We propose a method for human pose estimation based on Deep Neural Networks (DNNs). The pose estimation is formulated as a DNN-based regression problem towards body joints. We present a cascade of such DNN regressors which results in high precision pose estimates. The approach has the advantage of reasoning about pose in a holistic fashion and has a simple but yet powerful formulation which capitalizes on recent advances in Deep Learning. We present a detailed empirical analysis with state-of-art or better performance on four academic benchmarks of diverse real-world images.

Paper Structure

This paper contains 19 sections, 8 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Besides extreme variability in articulations, many of the joints are barely visible. We can guess the location of the right arm in the left image only because we see the rest of the pose and anticipate the motion or activity of the person. Similarly, the left body half of the person on the right is not visible at all. These are examples of the need for holistic reasoning. We believe that DNNs can naturally provide such type of reasoning.
  • Figure 2: Left: schematic view of the DNN-based pose regression. We visualize the network layers with their corresponding dimensions, where convolutional layers are in blue, while fully connected ones are in green. We do not show the parameter free layers. Right: at stage $s$, a refining regressor is applied on a sub image to refine a prediction from the previous stage.
  • Figure 3: Percentage of detected joints (PDJ) on FLIC for two joints: elbow and wrist. We compare DeepPose, after two cascade stages, with four other approaches.
  • Figure 4: Percentage of detected joints (PDJ) on LSP for four limbs for DeepPose and Dantone et al. dantone13regressors over an extended range of distances to true joint: $[0, 0.5]$ of the torso diameter. Results of DeepPose are plotted with solid lines while all the results by dantone13regressors are plotted in dashed lines. Results for the same joint from both algorithms are colored with same color.
  • Figure 5: Percent of detected joints (PDJ) on FLIC or the first three stages of the DNN cascade. We present results over larger spectrum of normalized distances between prediction and ground truth.
  • ...and 4 more figures