Table of Contents
Fetching ...

Human Motion Modeling using DVGANs

Xiao Lin, Mohamed R. Amer

TL;DR

DVGANs introduce a dense-validation, text-conditioned GAN framework for human motion modeling. The architecture blends CNN and RNN generators with a CNN discriminator that validates representations at multiple time resolutions, augmented by data perturbations to enforce translation invariance and stability via WGAN-GP training. The approach enables long, diverse motion sequences and effective completion, with quantitative evaluation on Human3.6M and CMU Mocap using inception scores and a cross-modal ranker for animation retrieval, and qualitative long-horizon results. The work demonstrates that dense validation and text conditioning improve realism and diversity, and shows potential for generalizing across actions and generating unseen motions.

Abstract

We present a novel generative model for human motion modeling using Generative Adversarial Networks (GANs). We formulate the GAN discriminator using dense validation at each time-scale and perturb the discriminator input to make it translation invariant. Our model is capable of motion generation and completion. We show through our evaluations the resiliency to noise, generalization over actions, and generation of long diverse sequences. We evaluate our approach on Human 3.6M and CMU motion capture datasets using inception scores.

Human Motion Modeling using DVGANs

TL;DR

DVGANs introduce a dense-validation, text-conditioned GAN framework for human motion modeling. The architecture blends CNN and RNN generators with a CNN discriminator that validates representations at multiple time resolutions, augmented by data perturbations to enforce translation invariance and stability via WGAN-GP training. The approach enables long, diverse motion sequences and effective completion, with quantitative evaluation on Human3.6M and CMU Mocap using inception scores and a cross-modal ranker for animation retrieval, and qualitative long-horizon results. The work demonstrates that dense validation and text conditioning improve realism and diversity, and shows potential for generalizing across actions and generating unseen motions.

Abstract

We present a novel generative model for human motion modeling using Generative Adversarial Networks (GANs). We formulate the GAN discriminator using dense validation at each time-scale and perturb the discriminator input to make it translation invariant. Our model is capable of motion generation and completion. We show through our evaluations the resiliency to noise, generalization over actions, and generation of long diverse sequences. We evaluate our approach on Human 3.6M and CMU motion capture datasets using inception scores.

Paper Structure

This paper contains 21 sections, 10 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The architecture of our DVGANs. The generator network $G$ generates an animations hierarchically by interpolating higher level hidden representations into lower level representations conditioned on textual descriptions. We formulate our approach using two different generators, one using RNN (bottom) and one using CNN (middle). The discriminator network $D$ appraises videos as generated or real through summarizing video into hidden representations at different levels and then validate whether those representations to match the textual descriptions. We formulate discriminator using a CNN (top). We inject noise at the input of the discriminator by changing the starting point of different actions and do dense validation at the different time resolutions.
  • Figure 2: Qualitative results on CMU Mocap dataset (top) and H3.6M (bottom). As you can see, DVGANs is able to generate smooth actions diversely on both datasets.
  • Figure 3: Qualitative results for motion completion on H3.6M dataset.
  • Figure S1: The architecture of our CNN and RNN rankers.
  • Figure S2: The architecture of our RNN discriminator.
  • ...and 1 more figures