LAGOON: Language-Guided Motion Control
Shusheng Xu, Huaijie Wang, Jiaxuan Gao, Yutao Ouyang, Chao Yu, Yi Wu
TL;DR
This work tackles the challenge of making robots follow high-level language commands with physically realistic motions in the real world. It introduces LAGOON, a multi-phase pipeline that first generates a language-conditioned human motion via a diffusion model, then retargets it to a robot, and finally trains a control policy in a physics simulator using reinforcement learning, with domain randomization to bridge the sim-to-real gap. A novel reward design combines an adversarial, semantics-consistent signal with a state-error alignment achieved through dynamic-programming-based matching, and PPO with augmented critic inputs accelerates learning. The approach is demonstrated on both humanoid and quadrupedal robots, including a real-world deployment, showing diverse, language-aligned behaviors and improved success rates over baselines, underscoring its practical potential for naturalistic, language-guided robotics.
Abstract
We aim to control a robot to physically behave in the real world following any high-level language command like "cartwheel" or "kick". Although human motion datasets exist, this task remains particularly challenging since generative models can produce physically unrealistic motions, which will be more severe for robots due to different body structures and physical properties. Deploying such a motion to a physical robot can cause even greater difficulties due to the sim2real gap. We develop LAnguage-Guided mOtion cONtrol (LAGOON), a multi-phase reinforcement learning (RL) method to generate physically realistic robot motions under language commands. LAGOON first leverages a pretrained model to generate a human motion from a language command. Then an RL phase trains a control policy in simulation to mimic the generated human motion. Finally, with domain randomization, our learned policy can be deployed to a quadrupedal robot, leading to a quadrupedal robot that can take diverse behaviors in the real world under natural language commands
