Table of Contents
Fetching ...

Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers

Sahil Bhandary Karnoor, Romit Roy Choudhury

TL;DR

This paper formulate pose estimation as an inverse problem and design an algorithm capable of zero-shot generalization, which generatively estimates the highly likely sequence of poses that best explains the sparse on-body measurements.

Abstract

Pose estimation refers to tracking a human's full body posture, including their head, torso, arms, and legs. The problem is challenging in practical settings where the number of body sensors are limited. Past work has shown promising results using conditional diffusion models, where the pose prediction is conditioned on both <location, rotation> measurements from the sensors. Unfortunately, nearly all these approaches generalize poorly across users, primarly because location measurements are highly influenced by the body size of the user. In this paper, we formulate pose estimation as an inverse problem and design an algorithm capable of zero-shot generalization. Our idea utilizes a pre-trained diffusion model and conditions it on rotational measurements alone; the priors from this model are then guided by a likelihood term, derived from the measured locations. Thus, given any user, our proposed InPose method generatively estimates the highly likely sequence of poses that best explains the sparse on-body measurements.

Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers

TL;DR

This paper formulate pose estimation as an inverse problem and design an algorithm capable of zero-shot generalization, which generatively estimates the highly likely sequence of poses that best explains the sparse on-body measurements.

Abstract

Pose estimation refers to tracking a human's full body posture, including their head, torso, arms, and legs. The problem is challenging in practical settings where the number of body sensors are limited. Past work has shown promising results using conditional diffusion models, where the pose prediction is conditioned on both <location, rotation> measurements from the sensors. Unfortunately, nearly all these approaches generalize poorly across users, primarly because location measurements are highly influenced by the body size of the user. In this paper, we formulate pose estimation as an inverse problem and design an algorithm capable of zero-shot generalization. Our idea utilizes a pre-trained diffusion model and conditions it on rotational measurements alone; the priors from this model are then guided by a likelihood term, derived from the measured locations. Thus, given any user, our proposed InPose method generatively estimates the highly likely sequence of poses that best explains the sparse on-body measurements.

Paper Structure

This paper contains 15 sections, 2 theorems, 22 equations, 12 figures, 3 tables, 1 algorithm.

Key Result

theorem 1

We are given a well-trained error model $\epsilon_{\theta}$, that learns the error distribution $\epsilon_t\leftarrow \epsilon_{\theta}(r^t_{M},t,r_{m})$, and denoises $\hat{r}^t_{M} \leftarrow \frac{r^t_{M}-\sqrt{1-\bar{\alpha}_t}\epsilon_{t}}{\sqrt{\bar{\alpha}_t}}$. If the model ensures that $||\

Figures (12)

  • Figure 1: (a) InPose's input and output visualized over $4$ time frames. (b) "T" pose. (c) Pose with depiction of rotation angle and root translation.
  • Figure 2: InPose pipeline: 3-sensor rotation + location measurements are inputs. Rotations fed as conditions to CFG which outputs conditional prior; location measurements estimate the likelihood, which steers denoising.
  • Figure 3: (a) Position error vs. body scale. (b) Rotation error vs. body scale. (c) Position error vs. location noise. All these tests were performed using Protocol 1
  • Figure 4: Qualitative results with scaling body size. The same pose is used for all scales.
  • Figure 5: 6DoF vs. rotation matrix.
  • ...and 7 more figures

Theorems & Definitions (3)

  • theorem 1
  • theorem 1
  • proof