Table of Contents
Fetching ...

ProPLIKS: Probablistic 3D human body pose estimation

Karthik Shetty, Annette Birkhold, Bernhard Egger, Srikrishna Jaganathan, Norbert Strobel, Markus Kowarschik, Andreas Maier

TL;DR

ProPLIKS presents a probabilistic framework for 3D human mesh recovery from 2D inputs by modeling rotations on the $SO(3)$ manifold with a Möbius-flow-based normalizing flow and coupling it with a conditional Gaussian shape prior and a differentiable PLIKS solver. This design yields multiple plausible pose hypotheses, improved 2D alignment, and the ability to incorporate multi-view data without retraining. The approach demonstrates strong performance on RGB benchmarks and extends to X‑ray datasets, with notable gains from multi-view integration and ablation-supported advantages of the $SO(3)$-aware distribution. By combining rotation-aware probabilistic modeling with deterministic keypoint alignment, ProPLIKS offers a scalable, adaptable solution for both standard computer vision tasks and medical-imaging scenarios.

Abstract

We present a novel approach for 3D human pose estimation by employing probabilistic modeling. This approach leverages the advantages of normalizing flows in non-Euclidean geometries to address uncertain poses. Specifically, our method employs normalizing flow tailored to the SO(3) rotational group, incorporating a coupling mechanism based on the Möbius transformation. This enables the framework to accurately represent any distribution on SO(3), effectively addressing issues related to discontinuities. Additionally, we reinterpret the challenge of reconstructing 3D human figures from 2D pixel-aligned inputs as the task of mapping these inputs to a range of probable poses. This perspective acknowledges the intrinsic ambiguity of the task and facilitates a straightforward integration method for multi-view scenarios. The combination of these strategies showcases the effectiveness of probabilistic models in complex scenarios for human pose estimation techniques. Our approach notably surpasses existing methods in the field of pose estimation. We also validate our methodology on human pose estimation from RGB images as well as medical X-Ray datasets.

ProPLIKS: Probablistic 3D human body pose estimation

TL;DR

ProPLIKS presents a probabilistic framework for 3D human mesh recovery from 2D inputs by modeling rotations on the manifold with a Möbius-flow-based normalizing flow and coupling it with a conditional Gaussian shape prior and a differentiable PLIKS solver. This design yields multiple plausible pose hypotheses, improved 2D alignment, and the ability to incorporate multi-view data without retraining. The approach demonstrates strong performance on RGB benchmarks and extends to X‑ray datasets, with notable gains from multi-view integration and ablation-supported advantages of the -aware distribution. By combining rotation-aware probabilistic modeling with deterministic keypoint alignment, ProPLIKS offers a scalable, adaptable solution for both standard computer vision tasks and medical-imaging scenarios.

Abstract

We present a novel approach for 3D human pose estimation by employing probabilistic modeling. This approach leverages the advantages of normalizing flows in non-Euclidean geometries to address uncertain poses. Specifically, our method employs normalizing flow tailored to the SO(3) rotational group, incorporating a coupling mechanism based on the Möbius transformation. This enables the framework to accurately represent any distribution on SO(3), effectively addressing issues related to discontinuities. Additionally, we reinterpret the challenge of reconstructing 3D human figures from 2D pixel-aligned inputs as the task of mapping these inputs to a range of probable poses. This perspective acknowledges the intrinsic ambiguity of the task and facilitates a straightforward integration method for multi-view scenarios. The combination of these strategies showcases the effectiveness of probabilistic models in complex scenarios for human pose estimation techniques. Our approach notably surpasses existing methods in the field of pose estimation. We also validate our methodology on human pose estimation from RGB images as well as medical X-Ray datasets.

Paper Structure

This paper contains 33 sections, 5 equations, 9 figures, 5 tables, 3 algorithms.

Figures (9)

  • Figure 1: In the Möbius layer, $\boldsymbol{u}_1$ acts as the identity vector, which is conditioned by a neural network to derive $\boldsymbol{\omega}$. The vector $\boldsymbol{u}_2$ is subjected to the Möbius transformation. The third vector, $\boldsymbol{u}_3$, is generated from the cross product of $\boldsymbol{u}_2$ and $\boldsymbol{u}_1$. We project $\boldsymbol{u}_2$ and $\boldsymbol{\omega}$ onto the complex plane (indicated in yellow) using the basis vectors $\boldsymbol{u}_2$ and $\boldsymbol{u}_3$. The Möbius transformation is then applied to produce the transformed vector $\boldsymbol{\hat{u}}_2$, which is subsequently re-projected onto the 3D space to yield $\boldsymbol{\bar{u}}_2$.
  • Figure 2: Overview of the proposed framework: Starting with an input image, the backbone of the network generates a feature vector $\mathbf{x}$ that is used for sampling poses via normalizing flow and shapes through Gaussian sampling. Concurrently, it predicts a sparse set of keypoints on the image plane, each associated with a measure of uncertainty. These sampled poses and shapes are then input into the PLIKS solver, which optimizes the parameters for pose, shape, and translation based on the predicted keypoints. Additionally, the solver has the capability to process multiple images simultaneously, integrating individual predictions from each image sample.
  • Figure 3: Variations in flow sampling. Here, green represents the rotation distribution that needs to be learned, red indicates the samples obtained after training the flow model, and blue represents the mode of the distribution.
  • Figure 4: Qualitative results are displayed for both RGB and X-Ray images. Pink represents the mode of the distribution, while green shows a sample from the distribution. The fourth row quantifies the variance from the sampled poses in centimeters. The scale of variance is 15 cm for RGB images and 1.5 cm for X-Ray images. The last row provides a 3D depiction of the network's predictions.
  • Figure 5: Qualitative results on multiview images (from 4 cameras) on the Human3.6m h36m dataset.
  • ...and 4 more figures