Table of Contents
Fetching ...

Pose Prior Learner: Unsupervised Categorical Prior Learning for Pose Estimation

Ziyu Wang, Shuangpeng Han, Mengmi Zhang

TL;DR

The challenge of unsupervised categorical prior learning in pose estimation, where AI models learn a general pose prior for an object category from images in a self-supervised manner, is introduced.

Abstract

A prior represents a set of beliefs or assumptions about a system, aiding inference and decision-making. In this paper, we introduce the challenge of unsupervised categorical prior learning in pose estimation, where AI models learn a general pose prior for an object category from images in a self-supervised manner. Although priors are effective in estimating pose, acquiring them can be difficult. We propose a novel method, named Pose Prior Learner (PPL), to learn a general pose prior for any object category. PPL uses a hierarchical memory to store compositional parts of prototypical poses, from which we distill a general pose prior. This prior improves pose estimation accuracy through template transformation and image reconstruction. PPL learns meaningful pose priors without any additional human annotations or interventions, outperforming competitive baselines on both human and animal pose estimation datasets. Notably, our experimental results reveal the effectiveness of PPL using learned prototypical poses for pose estimation on occluded images. Through iterative inference, PPL leverages the pose prior to refine estimated poses, regressing them to any prototypical poses stored in memory. Our code, model, and data are publicly available at: https://github.com/ZhangLab-DeepNeuroCogLab/Pose-Prior-Learner.

Pose Prior Learner: Unsupervised Categorical Prior Learning for Pose Estimation

TL;DR

The challenge of unsupervised categorical prior learning in pose estimation, where AI models learn a general pose prior for an object category from images in a self-supervised manner, is introduced.

Abstract

A prior represents a set of beliefs or assumptions about a system, aiding inference and decision-making. In this paper, we introduce the challenge of unsupervised categorical prior learning in pose estimation, where AI models learn a general pose prior for an object category from images in a self-supervised manner. Although priors are effective in estimating pose, acquiring them can be difficult. We propose a novel method, named Pose Prior Learner (PPL), to learn a general pose prior for any object category. PPL uses a hierarchical memory to store compositional parts of prototypical poses, from which we distill a general pose prior. This prior improves pose estimation accuracy through template transformation and image reconstruction. PPL learns meaningful pose priors without any additional human annotations or interventions, outperforming competitive baselines on both human and animal pose estimation datasets. Notably, our experimental results reveal the effectiveness of PPL using learned prototypical poses for pose estimation on occluded images. Through iterative inference, PPL leverages the pose prior to refine estimated poses, regressing them to any prototypical poses stored in memory. Our code, model, and data are publicly available at: https://github.com/ZhangLab-DeepNeuroCogLab/Pose-Prior-Learner.
Paper Structure (22 sections, 12 figures, 7 tables)

This paper contains 22 sections, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Schematic illustration of the unsupervised categorical prior learning challenge. On the left (a), given a series of input images (blue frames), the goal is to learn a pose prior (green rectangle) in a fully self-supervised manner. This pose prior consists of both keypoint and connectivity priors. To tackle this problem, we introduce the model, named as Pose Prior Learner (PPL). Demonstration of the learned prior in pose estimation and inference under occlusion scenarios. On the right (b), we showcase how the learned pose prior enhances performance in challenging tasks such as pose estimation and body pose inference under occlusion. Notably, although the PPL are trained only on full-body, non-occluded images, our proposed approach is still able to produce plausible pose predictions even when substantial occlusions are present.
  • Figure 2: Overview of our proposed Pose Prior Learner (PPL). We first distill the keypoint prior from the hierarchical memory $M$. Features of the image $I$ and the embedding of the keypoint prior are concatenated to predict the affine transformation parameters. The keypoint prior is transformed and their pair-wise links are modulated with the connectivity prior $W$ to obtain the combined link heatmap $S$. The concatenation of the link heatmap $S$ and the reference image $I_{ref}$ is decoded to produce the reconstructed image $I_{recon}$. The $sg$ symbol represents the stopping gradient operation. The red arrows indicate the gradient flows during backpropagation based on image reconstruction. See Section \ref{['sec:trainingAndInference']} for training details.
  • Figure 3: Overview of the iterative inference strategy in our PPL (Section \ref{['sec:trainingAndInference']}). During inference, we iteratively use the reconstructed image $I_{recon}$ as input to estimate the pose $T'$. The hierarchical memory $M$ refines the estimated pose $T'$ and outputs $T'_{recon}$. The original image $I$ is used as the reference image to reconstruct the image $I_{recon}$. It is then used as the input image in the next iteration.
  • Figure 4: Visualization results of poses estimation on Human3.6m. (a) Pose estimation on occluded images in Human3.6m. The first column shows the original image and its estimated pose by PPL. Columns 2-5 show the iterative inference process where the reconstructed images by PPL (Row 1 and 3) are fed back to itself for estimating poses (Rows 2 and 4) on occluded images either using CenterMasking (Row 1 and 2) or RandomMasking (Row 3 and 4). (b) The pose prior evolves as a function of training epochs (from top to bottom). (c) Comparison between PPL and AutoLink autolink in estimated poses on example images from Human3.6m. The left column shows testing images, the middle column and the right column show estimated poses by AutoLink and PPL respectively. All results are obtained with 16 keypoints.
  • Figure A1: Comparison between PPL and AutoLink pose estimation results on occluded images on Human3.6m. The left column shows testing images, the mid column shows results of AutoLink, and the right column shows results from PPL. All results are obtained with 16 keypoints.
  • ...and 7 more figures