Human Motion Modeling using DVGANs
Xiao Lin, Mohamed R. Amer
TL;DR
DVGANs introduce a dense-validation, text-conditioned GAN framework for human motion modeling. The architecture blends CNN and RNN generators with a CNN discriminator that validates representations at multiple time resolutions, augmented by data perturbations to enforce translation invariance and stability via WGAN-GP training. The approach enables long, diverse motion sequences and effective completion, with quantitative evaluation on Human3.6M and CMU Mocap using inception scores and a cross-modal ranker for animation retrieval, and qualitative long-horizon results. The work demonstrates that dense validation and text conditioning improve realism and diversity, and shows potential for generalizing across actions and generating unseen motions.
Abstract
We present a novel generative model for human motion modeling using Generative Adversarial Networks (GANs). We formulate the GAN discriminator using dense validation at each time-scale and perturb the discriminator input to make it translation invariant. Our model is capable of motion generation and completion. We show through our evaluations the resiliency to noise, generalization over actions, and generation of long diverse sequences. We evaluate our approach on Human 3.6M and CMU motion capture datasets using inception scores.
