Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction
Guowei Xu, Jiale Tao, Wen Li, Lixin Duan
TL;DR
This work tackles stochastic human motion prediction by addressing the weak guidance of latent distributions in generative models. It introduces Semantic Latent Directions (SLD), an orthogonal latent basis that constrains the latent space and represents future motion as $z=\sum_{m=1}^{M} w_m d_m$, decoded by $\widehat{Y}=G_\phi(X,z)$, while diverse samples are produced via learnable motion queries projected into the SLD space. The method combines an encoder-decoder backbone with a Query-to-Latent Projection (QLP) and a DCT-based preprocessing pipeline, enabling accurate, diverse, and controllable predictions by editing latent coefficients. Extensive experiments on Human3.6M and HumanEva-I demonstrate state-of-the-art accuracy with competitive diversity, and ablations corroborate the benefit of projecting queries into the semantically structured latent space. The work offers a practical, lightweight pathway to semantically disentangled motion representations and controllable SHMP, with code and pretrained models released for reproducibility.
Abstract
In the realm of stochastic human motion prediction (SHMP), researchers have often turned to generative models like GANS, VAEs and diffusion models. However, most previous approaches have struggled to accurately predict motions that are both realistic and coherent with past motion due to a lack of guidance on the latent distribution. In this paper, we introduce Semantic Latent Directions (SLD) as a solution to this challenge, aiming to constrain the latent space to learn meaningful motion semantics and enhance the accuracy of SHMP. SLD defines a series of orthogonal latent directions and represents the hypothesis of future motion as a linear combination of these directions. By creating such an information bottleneck, SLD excels in capturing meaningful motion semantics, thereby improving the precision of motion predictions. Moreover, SLD offers controllable prediction capabilities by adjusting the coefficients of the latent directions during the inference phase. Expanding on SLD, we introduce a set of motion queries to enhance the diversity of predictions. By aligning these motion queries with the SLD space, SLD is further promoted to more accurate and coherent motion predictions. Through extensive experiments conducted on widely used benchmarks, we showcase the superiority of our method in accurately predicting motions while maintaining a balance of realism and diversity. Our code and pretrained models are available at https://github.com/GuoweiXu368/SLD-HMP.
