Table of Contents
Fetching ...

HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

Xiaoyu Huang, Qiayuan Liao, Yiming Ni, Zhongyu Li, Laura Smith, Sergey Levine, Xue Bin Peng, Koushil Sreenath

TL;DR

HiLMa-Res presents a general, hierarchical reinforcement learning framework that decouples locomotion control from manipulation planning for quadrupedal loco-manipulation. A task-independent operational-space locomotion controller tracks end-effector trajectories using nominal CPG-based motion plus residual Bézier corrections, while a task-specific planner outputs residual trajectories and base commands to accomplish diverse tasks. The system demonstrates ball dribbling, stepping over obstacles, and load navigation across simulation and real-world settings, with real-world fine-tuning via data-efficient methods and favorable comparisons to baselines. This modular approach enables fast adaptation to new loco-manipulation tasks and supports different observation modalities, including vision, making it practical for real-world deployment on quadrupeds.

Abstract

This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel design of this framework tackles the challenges of integrating continuous locomotion control and manipulation using legs. It develops an operational space locomotion controller that can track arbitrary robot end-effector (toe) trajectories while walking at different velocities. This controller is designed to be general to different downstream tasks, and therefore, can be utilized in high-level manipulation planning policy to address specific tasks. To demonstrate the versatility of this framework, we utilize HiLMa-Res to tackle several challenging loco-manipulation tasks using a quadrupedal robot in the real world. These tasks span from leveraging state-based policy to vision-based policy, from training purely from the simulation data to learning from real-world data. In these tasks, HiLMa-Res shows better performance than other methods.

HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

TL;DR

HiLMa-Res presents a general, hierarchical reinforcement learning framework that decouples locomotion control from manipulation planning for quadrupedal loco-manipulation. A task-independent operational-space locomotion controller tracks end-effector trajectories using nominal CPG-based motion plus residual Bézier corrections, while a task-specific planner outputs residual trajectories and base commands to accomplish diverse tasks. The system demonstrates ball dribbling, stepping over obstacles, and load navigation across simulation and real-world settings, with real-world fine-tuning via data-efficient methods and favorable comparisons to baselines. This modular approach enables fast adaptation to new loco-manipulation tasks and supports different observation modalities, including vision, making it practical for real-world deployment on quadrupeds.

Abstract

This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel design of this framework tackles the challenges of integrating continuous locomotion control and manipulation using legs. It develops an operational space locomotion controller that can track arbitrary robot end-effector (toe) trajectories while walking at different velocities. This controller is designed to be general to different downstream tasks, and therefore, can be utilized in high-level manipulation planning policy to address specific tasks. To demonstrate the versatility of this framework, we utilize HiLMa-Res to tackle several challenging loco-manipulation tasks using a quadrupedal robot in the real world. These tasks span from leveraging state-based policy to vision-based policy, from training purely from the simulation data to learning from real-world data. In these tasks, HiLMa-Res shows better performance than other methods.
Paper Structure (40 sections, 6 figures, 3 tables)

This paper contains 40 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The proposed HILMA-Res framework enables a quadrupedal robot to perform different loco-manipulation tasks in the real world. These include dribbling a ball in a desired direction, stepping over small blocks scattered on the ground, and navigating a load to the desired goal through real-world learning. We highlight the versatility of the HILMA-Res framework in various loco-manipulation tasks with different observation spaces and learning algorithms.
  • Figure 2: The HiLMa-Res framework. This hierarchical framework consists of a controller training stage where we train a task-independent locomotion controller that tracks desired end-effector trajectories. This is an addition from sampled CPG trajectories and sampled Bézier residual trajectories. A planner training stage is designed to reuse the locomotion controller to train a task-specific manipulation planner for downstream loco-manipulation tasks. We highlight the importance of reusing a pre-trained locomotion controller that has been evaluated in the real-world environment, which enables fast and efficient learning of the planner and prevents learning dynamically infeasible actions.
  • Figure 3: In this work, the quadrupedal robot employs a trotting gait, characterized by two diagonal legs swinging simultaneously. For each swing leg, there is a desired end-effector trajectory, which is a summation of a nominal CPG trajectory $\bm{\xi}_n$ and a residual Bézier trajectory $\bm{\xi}_r$. By learning to change the control points of Bézier trajectories and the base movement, the policy could adjust the swing trajectories to perform various loco-manipulation tasks.
  • Figure 4: Recorded data during NavLoad experiments, along with snapshots capturing the robot's behavior during its first interaction with the load. (a) Zero-shot transfer of the base policy trained in simulation, the robot tends to have a large detour to move the load to the target due to a large sim-to-real gap (such as friction, sensor noise, etc). (b) Training with real-world data by RLPD, the robot first adjusts its pose and then pushes the load along the direction of the goal, with shorter path and operation time. This showcases the advantages of HILMA-Res in the fine-grained manipulation task that requires efficient training from real-world data.
  • Figure 5: Visualization of real-world ball dribbling experiments, using the proposed HILMA-Res framework. The quadruped can perform a sharp U-Turn within a narrow space of less than 3.3 meters in width. This demonstrates that the HILMA-Res can enable agile loco-manipulation maneuvers that can be directly transferred from simulation to the real world.
  • ...and 1 more figures