FlexMotion: Lightweight, Physics-Aware, and Controllable Human Motion Generation
Arvin Tashakori, Arash Tashakori, Gongbo Yang, Z. Jane Wang, Peyman Servati
TL;DR
FlexMotion tackles the challenge of generating controllable, physically plausible human motion with high efficiency by learning in a latent space and avoiding external physics simulators. It integrates a physics-aware multimodal Transformer autoencoder with a latent diffusion model conditioned on text and a Spatial Controllability Module to steer motion via biomechanical signals such as muscle activations, joint torques, and contact forces. The approach achieves superior realism, physical plausibility, and controllability across HumanML3D, KIT-ML, and FLAG3D, aided by OpenSim-augmented data, while maintaining computational efficiency through latent-space diffusion. This work highlights the value of embedding biomechanical constraints directly into generative models to enable fine-grained control suitable for animation, robotics, and HCI, and points to future work on more complex dynamics and real-world data alignment.
Abstract
Lightweight, controllable, and physically plausible human motion synthesis is crucial for animation, virtual reality, robotics, and human-computer interaction applications. Existing methods often compromise between computational efficiency, physical realism, or spatial controllability. We propose FlexMotion, a novel framework that leverages a computationally lightweight diffusion model operating in the latent space, eliminating the need for physics simulators and enabling fast and efficient training. FlexMotion employs a multimodal pre-trained Transformer encoder-decoder, integrating joint locations, contact forces, joint actuations and muscle activations to ensure the physical plausibility of the generated motions. FlexMotion also introduces a plug-and-play module, which adds spatial controllability over a range of motion parameters (e.g., joint locations, joint actuations, contact forces, and muscle activations). Our framework achieves realistic motion generation with improved efficiency and control, setting a new benchmark for human motion synthesis. We evaluate FlexMotion on extended datasets and demonstrate its superior performance in terms of realism, physical plausibility, and controllability.
