Table of Contents
Fetching ...

Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models

Yutao Ouyang, Jinhan Li, Yunfei Li, Zhongyu Li, Chao Yu, Koushil Sreenath, Yi Wu

TL;DR

This work tackles long-horizon loco-manipulation for quadrupedal robots by combining a cascade of large language models for high-level planning with a reinforcement-learning-based pool of low-level skills. The high-level module decomposes tasks into discrete and continuous actions via a semantic planner, parameter calculator, code generator, and a replanner to ground to executable robot code; the low-level layer learns locomotion and manipulation policies through PPO and hierarchical RL, including recovery strategies. The system demonstrates multi-step strategies such as building tools or requesting human help, achieving over 70% success in simulation and successful real-world deployment on a CyberDog2 after domain randomization. These results underscore the importance of branching, replanning, and policy chaining for robust autonomous long-horizon robot autonomy in complex, unstructured environments.

Abstract

We present a large language model (LLM) based system to empower quadrupedal robots with problem-solving abilities for long-horizon tasks beyond short-term motions. Long-horizon tasks for quadrupeds are challenging since they require both a high-level understanding of the semantics of the problem for task planning and a broad range of locomotion and manipulation skills to interact with the environment. Our system builds a high-level reasoning layer with large language models, which generates hybrid discrete-continuous plans as robot code from task descriptions. It comprises multiple LLM agents: a semantic planner that sketches a plan, a parameter calculator that predicts arguments in the plan, a code generator that converts the plan into executable robot code, and a replanner that handles execution failures or human interventions. At the low level, we adopt reinforcement learning to train a set of motion planning and control skills to unleash the flexibility of quadrupeds for rich environment interactions. Our system is tested on long-horizon tasks that are infeasible to complete with one single skill. Simulation and real-world experiments show that it successfully figures out multi-step strategies and demonstrates non-trivial behaviors, including building tools or notifying a human for help. Demos are available on our project page: https://sites.google.com/view/long-horizon-robot.

Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models

TL;DR

This work tackles long-horizon loco-manipulation for quadrupedal robots by combining a cascade of large language models for high-level planning with a reinforcement-learning-based pool of low-level skills. The high-level module decomposes tasks into discrete and continuous actions via a semantic planner, parameter calculator, code generator, and a replanner to ground to executable robot code; the low-level layer learns locomotion and manipulation policies through PPO and hierarchical RL, including recovery strategies. The system demonstrates multi-step strategies such as building tools or requesting human help, achieving over 70% success in simulation and successful real-world deployment on a CyberDog2 after domain randomization. These results underscore the importance of branching, replanning, and policy chaining for robust autonomous long-horizon robot autonomy in complex, unstructured environments.

Abstract

We present a large language model (LLM) based system to empower quadrupedal robots with problem-solving abilities for long-horizon tasks beyond short-term motions. Long-horizon tasks for quadrupeds are challenging since they require both a high-level understanding of the semantics of the problem for task planning and a broad range of locomotion and manipulation skills to interact with the environment. Our system builds a high-level reasoning layer with large language models, which generates hybrid discrete-continuous plans as robot code from task descriptions. It comprises multiple LLM agents: a semantic planner that sketches a plan, a parameter calculator that predicts arguments in the plan, a code generator that converts the plan into executable robot code, and a replanner that handles execution failures or human interventions. At the low level, we adopt reinforcement learning to train a set of motion planning and control skills to unleash the flexibility of quadrupeds for rich environment interactions. Our system is tested on long-horizon tasks that are infeasible to complete with one single skill. Simulation and real-world experiments show that it successfully figures out multi-step strategies and demonstrates non-trivial behaviors, including building tools or notifying a human for help. Demos are available on our project page: https://sites.google.com/view/long-horizon-robot.
Paper Structure (20 sections, 6 figures, 2 tables)

This paper contains 20 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Overview of the hierarchical system for long-horizon loco-manipulation task. The system is built up from a reasoning layer for task planning and a controlling layer for skill execution. Given the language description of the long-horizon task, three cascaded LLM agents perform high-level task decomposition and generate a complete plan. A replanner is introduced to handle unexpected situations. The reasoning layer generates function calls for parameterized robot skills, and the controlling layer instantiates the mid-level motion planning and low-level controlling skills with RL.
  • Figure 2: Illustration of the LLM-based high-level reasoning layer that generates primary hybrid discrete-continuous plans from the task description in language. It is composed of a semantic planner that proposes a solution consisting of branches conditioned on environment specifications and primitive actions, a parameter calculator that fills in arguments for the actions, a code generator to summarize the plan as executable robot code. The texts are abbreviated generated contents from the LLMs.
  • Figure 3: Versatile behaviors driven by the RL skill repertoire.
  • Figure 4: Total number of occurrences for different error types in generated primary plan code, with ten samples per variant. Spatial error refers to errors in perceiving spatial relations leading to miscalculations, constraint violation refers to ignoring the given constraints, and logical error refers to faulty logical reasoning. The lower part shows the execution of the 4 tasks in simulation.
  • Figure 5: Replan deployment. a) Replan on execution failure: After a strategy fails, the robot uses a language-conditioned error detection pipeline to gather spatial information and adjust the parameters. b) Replan on interruption: When the human is unable to open the door but allows the robot to operate the door switch, the robot interrupts the original plan and presses the button to open the door, generating a new code plan while still achieving the long-term delivery goal.
  • ...and 1 more figures