Table of Contents
Fetching ...

LAGOON: Language-Guided Motion Control

Shusheng Xu, Huaijie Wang, Jiaxuan Gao, Yutao Ouyang, Chao Yu, Yi Wu

TL;DR

This work tackles the challenge of making robots follow high-level language commands with physically realistic motions in the real world. It introduces LAGOON, a multi-phase pipeline that first generates a language-conditioned human motion via a diffusion model, then retargets it to a robot, and finally trains a control policy in a physics simulator using reinforcement learning, with domain randomization to bridge the sim-to-real gap. A novel reward design combines an adversarial, semantics-consistent signal with a state-error alignment achieved through dynamic-programming-based matching, and PPO with augmented critic inputs accelerates learning. The approach is demonstrated on both humanoid and quadrupedal robots, including a real-world deployment, showing diverse, language-aligned behaviors and improved success rates over baselines, underscoring its practical potential for naturalistic, language-guided robotics.

Abstract

We aim to control a robot to physically behave in the real world following any high-level language command like "cartwheel" or "kick". Although human motion datasets exist, this task remains particularly challenging since generative models can produce physically unrealistic motions, which will be more severe for robots due to different body structures and physical properties. Deploying such a motion to a physical robot can cause even greater difficulties due to the sim2real gap. We develop LAnguage-Guided mOtion cONtrol (LAGOON), a multi-phase reinforcement learning (RL) method to generate physically realistic robot motions under language commands. LAGOON first leverages a pretrained model to generate a human motion from a language command. Then an RL phase trains a control policy in simulation to mimic the generated human motion. Finally, with domain randomization, our learned policy can be deployed to a quadrupedal robot, leading to a quadrupedal robot that can take diverse behaviors in the real world under natural language commands

LAGOON: Language-Guided Motion Control

TL;DR

This work tackles the challenge of making robots follow high-level language commands with physically realistic motions in the real world. It introduces LAGOON, a multi-phase pipeline that first generates a language-conditioned human motion via a diffusion model, then retargets it to a robot, and finally trains a control policy in a physics simulator using reinforcement learning, with domain randomization to bridge the sim-to-real gap. A novel reward design combines an adversarial, semantics-consistent signal with a state-error alignment achieved through dynamic-programming-based matching, and PPO with augmented critic inputs accelerates learning. The approach is demonstrated on both humanoid and quadrupedal robots, including a real-world deployment, showing diverse, language-aligned behaviors and improved success rates over baselines, underscoring its practical potential for naturalistic, language-guided robotics.

Abstract

We aim to control a robot to physically behave in the real world following any high-level language command like "cartwheel" or "kick". Although human motion datasets exist, this task remains particularly challenging since generative models can produce physically unrealistic motions, which will be more severe for robots due to different body structures and physical properties. Deploying such a motion to a physical robot can cause even greater difficulties due to the sim2real gap. We develop LAnguage-Guided mOtion cONtrol (LAGOON), a multi-phase reinforcement learning (RL) method to generate physically realistic robot motions under language commands. LAGOON first leverages a pretrained model to generate a human motion from a language command. Then an RL phase trains a control policy in simulation to mimic the generated human motion. Finally, with domain randomization, our learned policy can be deployed to a quadrupedal robot, leading to a quadrupedal robot that can take diverse behaviors in the real world under natural language commands
Paper Structure (24 sections, 12 equations, 6 figures, 2 tables)

This paper contains 24 sections, 12 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: We developed a system called LAGOON. Given a high-level language command, LAGOON can autonomously train a policy to control the quadrupedal robotic to take actions according to the provided command.
  • Figure 2: Overview of the multi-phase method LAnguage-Guided mOtion cONtrol (LAGOON). To make a robot follow a language command such as "throw a ball", we first generate a human motion using the motion generation model. Then the human motion can be retargeted to a robot skeleton that differs largely from humans. By introducing RL training, we train a robust control policy in the physics simulator. Finally, we deploy the control policy to the real-world robot.
  • Figure 3: Joint mappings from the SMPL skeleton to the quadrupedal robot. Note that 16 and 17 in the quadrupedal skeleton each correspond to 2 joints.
  • Figure 4: The reference motion sequence overlooks the law of physics. The trained policies robustly perform the "cartwheel" motion even in complex terrains (Wave) or a different skeleton with shorter arms (Short Hand).
  • Figure 5: The task of "throw a ball" on the quadrupedal robot. Even though the getting up does not appear in the reference motion, the quadrupedal robot learns how to get up from the ground and then wave its front legs.
  • ...and 1 more figures