Table of Contents
Fetching ...

Learned Neural Physics Simulation for Articulated 3D Human Pose Reconstruction

Mykhaylo Andriluka, Baruch Tabanpour, C. Daniel Freeman, Cristian Sminchisescu

TL;DR

The proposed novel neural network approach, LARP (Learned Articulated Rigid body Physics), is used as a drop-in replacement for a state of the art classical non-differentiable simulator in an existing video-based reconstruction framework and shows comparative or better 3D human pose reconstruction accuracy.

Abstract

We propose a novel neural network approach, LARP (Learned Articulated Rigid body Physics), to model the dynamics of articulated human motion with contact. Our goal is to develop a faster and more convenient methodological alternative to traditional physics simulators for use in computer vision tasks such as human motion reconstruction from video. To that end we introduce a training procedure and model components that support the construction of a recurrent neural architecture to accurately simulate articulated rigid body dynamics. Our neural architecture supports features typically found in traditional physics simulators, such as modeling of joint motors, variable dimensions of body parts, contact between body parts and objects, and is an order of magnitude faster than traditional systems when multiple simulations are run in parallel. To demonstrate the value of LARP we use it as a drop-in replacement for a state of the art classical non-differentiable simulator in an existing video-based reconstruction framework and show comparative or better 3D human pose reconstruction accuracy.

Learned Neural Physics Simulation for Articulated 3D Human Pose Reconstruction

TL;DR

The proposed novel neural network approach, LARP (Learned Articulated Rigid body Physics), is used as a drop-in replacement for a state of the art classical non-differentiable simulator in an existing video-based reconstruction framework and shows comparative or better 3D human pose reconstruction accuracy.

Abstract

We propose a novel neural network approach, LARP (Learned Articulated Rigid body Physics), to model the dynamics of articulated human motion with contact. Our goal is to develop a faster and more convenient methodological alternative to traditional physics simulators for use in computer vision tasks such as human motion reconstruction from video. To that end we introduce a training procedure and model components that support the construction of a recurrent neural architecture to accurately simulate articulated rigid body dynamics. Our neural architecture supports features typically found in traditional physics simulators, such as modeling of joint motors, variable dimensions of body parts, contact between body parts and objects, and is an order of magnitude faster than traditional systems when multiple simulations are run in parallel. To demonstrate the value of LARP we use it as a drop-in replacement for a state of the art classical non-differentiable simulator in an existing video-based reconstruction framework and show comparative or better 3D human pose reconstruction accuracy.

Paper Structure

This paper contains 16 sections, 5 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: Left: Examples of articulated 3d human pose reconstructions obtained with LARP on public benchmarks h36m_pamiaist-dance-db and real world video. Right: Comparison to common physics simulators coumans2015bulletMJXtodorov2012mujocofreeman2021brax in terms of simulation speed. The x-axis shows the number of parallel simulations, whereas the y-axis shows the total time taken to advance all the simulations to the next step.
  • Figure 2: Left: Overview of our approach (LARP). At time $t$ the input to the neural simulator is given by a state of the scene $\mathbf{S}_t$ and joint control targets $\mathbf{Q}^p_{t}$. Here state is composed of the state of the person $\mathbf{S}_t^p$ and ball $\mathbf{S}_t^b$. LARP propagates the state through time by recurrently applying contact and dynamics networks. We visualize the state of each rigid component of the articulated body using rectangles with a color matching the scene structures. Right: Scenarios used to evaluate our approach: chain of linked capsules (a), two colliding capsule chains (b), articulated pose reconstruction from video (c), human-ball and human-capsule collision handling (d). See Supp. Mat. for videos.
  • Figure 3: Example of a generated sequence obtained with a model that includes displacement features and displacement loss (bottom row), and without either of these elements (middle row). We highlight inconsistencies of the joint positions with the red circles.
  • Figure 4: Experiments with the dynamics network: training subsequence length $N_h$ (a), ablations of the joint displacement feature and loss (b), ablations of non-linearity type and learning rate schedule (c), and evaluation of variants for the contact network (d). Error bars show one standard deviation calculated over 5 runs. The x-axis sweeps the time window over which metrics are computed.
  • Figure 5: Evaluation of LARP on datasets with colliding objects.
  • ...and 4 more figures