Controllable Long-term Motion Generation with Extended Joint Targets
Eunjong Lee, Eunhee Kim, Sanghoon Hong, Eunho Jung, Jihoon Kim
TL;DR
COMET tackles real-time, controllable long-horizon human motion generation by unifying a Transformer-based conditional VAE with an adaptive joint-control mechanism. A joint-wise attention scheme enables arbitrary subsets of joints to be controlled without retraining, while a reference-guided feedback loop grounds generation in a learned pose manifold to prevent drift. The approach also supports plug-and-play stylization by swapping style GMMs at inference. Empirical results show strong performance on single- and multi-joint control, long-horizon tasks, in-betweening, and stylization, outperforming state-of-the-art baselines and demonstrating real-time viability for interactive applications.
Abstract
Generating stable and controllable character motion in real-time is a key challenge in computer animation. Existing methods often fail to provide fine-grained control or suffer from motion degradation over long sequences, limiting their use in interactive applications. We propose COMET, an autoregressive framework that runs in real time, enabling versatile character control and robust long-horizon synthesis. Our efficient Transformer-based conditional VAE allows for precise, interactive control over arbitrary user-specified joints for tasks like goal-reaching and in-betweening from a single model. To ensure long-term temporal stability, we introduce a novel reference-guided feedback mechanism that prevents error accumulation. This mechanism also serves as a plug-and-play stylization module, enabling real-time style transfer. Extensive evaluations demonstrate that COMET robustly generates high-quality motion at real-time speeds, significantly outperforming state-of-the-art approaches in complex motion control tasks and confirming its readiness for demanding interactive applications.
