Table of Contents
Fetching ...

Dynamic Legged Ball Manipulation on Rugged Terrains with Hierarchical Reinforcement Learning

Dongjie Zhu, Zhuo Yang, Tianhang Wu, Luzhou Ge, Xuesong Li, Qi Liu, Xiang Li

TL;DR

This work tackles dynamic ball manipulation by quadruped robots on rugged terrains through a hierarchical reinforcement learning framework. A high-level policy coordinates four low-level skills (two dribbling and two locomotion) via a context-aware estimator to adapt to terrain and ball state. The authors introduce Dynamic Skill-Focused Policy Optimization (DSF-PO) to address learning inefficiencies arising from mixed discrete-continuous actions, demonstrating superior learning and cross-terrain performance in simulation and zero-shot transfer to real hardware. Results show improved terrain traversability and robust cross-terrain dribbling, highlighting significant potential for autonomous legged loco-manipulation in disaster response and robot sports. The approach advances practical cross-terrain manipulation by enabling flexible skill switching and more efficient policy optimization under real-world constraints.

Abstract

Advancing the dynamic loco-manipulation capabilities of quadruped robots in complex terrains is crucial for performing diverse tasks. Specifically, dynamic ball manipulation in rugged environments presents two key challenges. The first is coordinating distinct motion modalities to integrate terrain traversal and ball control seamlessly. The second is overcoming sparse rewards in end-to-end deep reinforcement learning, which impedes efficient policy convergence. To address these challenges, we propose a hierarchical reinforcement learning framework. A high-level policy, informed by proprioceptive data and ball position, adaptively switches between pre-trained low-level skills such as ball dribbling and rough terrain navigation. We further propose Dynamic Skill-Focused Policy Optimization to suppress gradients from inactive skills and enhance critical skill learning. Both simulation and real-world experiments validate that our methods outperform baseline approaches in dynamic ball manipulation across rugged terrains, highlighting its effectiveness in challenging environments. Videos are on our website: dribble-hrl.github.io.

Dynamic Legged Ball Manipulation on Rugged Terrains with Hierarchical Reinforcement Learning

TL;DR

This work tackles dynamic ball manipulation by quadruped robots on rugged terrains through a hierarchical reinforcement learning framework. A high-level policy coordinates four low-level skills (two dribbling and two locomotion) via a context-aware estimator to adapt to terrain and ball state. The authors introduce Dynamic Skill-Focused Policy Optimization (DSF-PO) to address learning inefficiencies arising from mixed discrete-continuous actions, demonstrating superior learning and cross-terrain performance in simulation and zero-shot transfer to real hardware. Results show improved terrain traversability and robust cross-terrain dribbling, highlighting significant potential for autonomous legged loco-manipulation in disaster response and robot sports. The approach advances practical cross-terrain manipulation by enabling flexible skill switching and more efficient policy optimization under real-world constraints.

Abstract

Advancing the dynamic loco-manipulation capabilities of quadruped robots in complex terrains is crucial for performing diverse tasks. Specifically, dynamic ball manipulation in rugged environments presents two key challenges. The first is coordinating distinct motion modalities to integrate terrain traversal and ball control seamlessly. The second is overcoming sparse rewards in end-to-end deep reinforcement learning, which impedes efficient policy convergence. To address these challenges, we propose a hierarchical reinforcement learning framework. A high-level policy, informed by proprioceptive data and ball position, adaptively switches between pre-trained low-level skills such as ball dribbling and rough terrain navigation. We further propose Dynamic Skill-Focused Policy Optimization to suppress gradients from inactive skills and enhance critical skill learning. Both simulation and real-world experiments validate that our methods outperform baseline approaches in dynamic ball manipulation across rugged terrains, highlighting its effectiveness in challenging environments. Videos are on our website: dribble-hrl.github.io.

Paper Structure

This paper contains 25 sections, 12 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Demonstration of legged ball dribbling with hierarchical framework. The high-level policy selects and coordinates pre-trained dribbling and locomotion skills for dynamic ball manipulation, optimized through deep RL. We deploy the trained policy in the real world via zero-shot transfer, enabling the robot to perform cross-terrain dribbling.
  • Figure 2: Proposed hierarchical framework. The figure illustrates that when the high-level actor outputs $\mathbf{d}_t=1$, only $\pi_1^L$ is activated, with the first two dimensions of $\mathbf{c}_t^L$ provided as input. The context-aided estimator network $\boldsymbol{\phi}$ and all low-level skills $\pi^L$ are frozen during training.
  • Figure 3: Training curves of PPO with DSF-PO compared to standard PPO. The shaded regions indicate the standard deviation over multiple runs.
  • Figure 4: Cross-terrain dribbling performance evaluation. (a) A trajectory schematic of the ball dribbling across five terrains in sequence: stair descent, ramp-down, rough terrain, ramp-up, and flat ground. Each terrain measures 10m per side. (b) Visualization of the invocation of different low-level skills, where each thin vertical line represents a single invocation. (c-d) Visualization of the ball's velocity magnitude and direction. The horizontal axis represents the robot’s traveled horizontal distance.
  • Figure 5: Usage frequency of low-level skills across different terrains. The numbers represent the proportion of each low-level skill's usage frequency on a given terrain.
  • ...and 1 more figures