HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

Xiaoyu Huang; Qiayuan Liao; Yiming Ni; Zhongyu Li; Laura Smith; Sergey Levine; Xue Bin Peng; Koushil Sreenath

HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

Xiaoyu Huang, Qiayuan Liao, Yiming Ni, Zhongyu Li, Laura Smith, Sergey Levine, Xue Bin Peng, Koushil Sreenath

TL;DR

HiLMa-Res presents a general, hierarchical reinforcement learning framework that decouples locomotion control from manipulation planning for quadrupedal loco-manipulation. A task-independent operational-space locomotion controller tracks end-effector trajectories using nominal CPG-based motion plus residual Bézier corrections, while a task-specific planner outputs residual trajectories and base commands to accomplish diverse tasks. The system demonstrates ball dribbling, stepping over obstacles, and load navigation across simulation and real-world settings, with real-world fine-tuning via data-efficient methods and favorable comparisons to baselines. This modular approach enables fast adaptation to new loco-manipulation tasks and supports different observation modalities, including vision, making it practical for real-world deployment on quadrupeds.

Abstract

This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel design of this framework tackles the challenges of integrating continuous locomotion control and manipulation using legs. It develops an operational space locomotion controller that can track arbitrary robot end-effector (toe) trajectories while walking at different velocities. This controller is designed to be general to different downstream tasks, and therefore, can be utilized in high-level manipulation planning policy to address specific tasks. To demonstrate the versatility of this framework, we utilize HiLMa-Res to tackle several challenging loco-manipulation tasks using a quadrupedal robot in the real world. These tasks span from leveraging state-based policy to vision-based policy, from training purely from the simulation data to learning from real-world data. In these tasks, HiLMa-Res shows better performance than other methods.

HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

TL;DR

Abstract

Paper Structure (40 sections, 6 figures, 3 tables)

This paper contains 40 sections, 6 figures, 3 tables.

Introduction
Contributions
Related Work
The HiLMa-Res Framework and Loco-Manipulation Tasks
Task-independent Quadrupedal Locomotion Control in Operational Space
Parameterized Reference Trajectory in Operational Space
Nominal Trajectories from CPG
Residual Trajectories Represented by Bézier Curves
Control Objective
Control Environment
Action
Observation
Goal
Reward
Task Randomization
...and 25 more sections

Figures (6)

Figure 1: The proposed HILMA-Res framework enables a quadrupedal robot to perform different loco-manipulation tasks in the real world. These include dribbling a ball in a desired direction, stepping over small blocks scattered on the ground, and navigating a load to the desired goal through real-world learning. We highlight the versatility of the HILMA-Res framework in various loco-manipulation tasks with different observation spaces and learning algorithms.
Figure 2: The HiLMa-Res framework. This hierarchical framework consists of a controller training stage where we train a task-independent locomotion controller that tracks desired end-effector trajectories. This is an addition from sampled CPG trajectories and sampled Bézier residual trajectories. A planner training stage is designed to reuse the locomotion controller to train a task-specific manipulation planner for downstream loco-manipulation tasks. We highlight the importance of reusing a pre-trained locomotion controller that has been evaluated in the real-world environment, which enables fast and efficient learning of the planner and prevents learning dynamically infeasible actions.
Figure 3: In this work, the quadrupedal robot employs a trotting gait, characterized by two diagonal legs swinging simultaneously. For each swing leg, there is a desired end-effector trajectory, which is a summation of a nominal CPG trajectory $\bm{\xi}_n$ and a residual Bézier trajectory $\bm{\xi}_r$. By learning to change the control points of Bézier trajectories and the base movement, the policy could adjust the swing trajectories to perform various loco-manipulation tasks.
Figure 4: Recorded data during NavLoad experiments, along with snapshots capturing the robot's behavior during its first interaction with the load. (a) Zero-shot transfer of the base policy trained in simulation, the robot tends to have a large detour to move the load to the target due to a large sim-to-real gap (such as friction, sensor noise, etc). (b) Training with real-world data by RLPD, the robot first adjusts its pose and then pushes the load along the direction of the goal, with shorter path and operation time. This showcases the advantages of HILMA-Res in the fine-grained manipulation task that requires efficient training from real-world data.
Figure 5: Visualization of real-world ball dribbling experiments, using the proposed HILMA-Res framework. The quadruped can perform a sharp U-Turn within a narrow space of less than 3.3 meters in width. This demonstrates that the HILMA-Res can enable agile loco-manipulation maneuvers that can be directly transferred from simulation to the real world.
...and 1 more figures

HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

TL;DR

Abstract

HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)