Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis
Zimo Li, Yi Zhou, Shuangjiu Xiao, Chong He, Zeng Huang, Hao Li
TL;DR
The paper tackles the challenge of long-horizon, realistic motion synthesis by addressing error accumulation in autoregressive models. It introduces the auto-conditioned RNN (acRNN) that trains the network to condition on its own outputs with a fixed length, enabling sustained generation of diverse motions such as dances and martial arts. Across quantitative and qualitative evaluations on CMU datasets, acRNN demonstrates markedly improved long-term stability, generating hundreds of seconds of coherent motion without permanent divergence or freezing, outperforming prior RNN-based approaches. This approach has practical implications for real-time animation and VR, enabling richer, stylistically varied motion generation without relying on extensive databases or hand-crafted priors.
Abstract
We present a real-time method for synthesizing highly complex human motions using a novel training regime we call the auto-conditioned Recurrent Neural Network (acRNN). Recently, researchers have attempted to synthesize new motion by using autoregressive techniques, but existing methods tend to freeze or diverge after a couple of seconds due to an accumulation of errors that are fed back into the network. Furthermore, such methods have only been shown to be reliable for relatively simple human motions, such as walking or running. In contrast, our approach can synthesize arbitrary motions with highly complex styles, including dances or martial arts in addition to locomotion. The acRNN is able to accomplish this by explicitly accommodating for autoregressive noise accumulation during training. Our work is the first to our knowledge that demonstrates the ability to generate over 18,000 continuous frames (300 seconds) of new complex human motion w.r.t. different styles.
