Table of Contents
Fetching ...

Playful DoggyBot: Learning Agile and Precise Quadrupedal Locomotion

Xin Duan, Ziwen Zhuang, Hang Zhao, Soeren Schwertfeger

TL;DR

Playful DoggyBot addresses the challenge of achieving both agility and precision in quadrupedal manipulation by using a perception-control decoupled RL framework and a memory-equipped policy to track and catch small, fast-moving objects during high-dynamic locomotion. The approach combines dual reward terms for agility and precision, a curriculum over target heights, and a PD-based real-world deployment pipeline, trained in simulation and transferred to a real Unitree Go2 with a passive mouth-like gripper. Results show the robot can track targets up to speeds of $3\ \mathrm{m/s}$ and catch objects at heights up to $0.8-1.05\ \mathrm{m}$ in real or simulated environments, with GRU and Transformer backbones outperforming MLP in most settings. The work highlights a sim-to-real gap driven by perception latency and sensing noise, and points to future integration with vision-language models to expand dynamic object interaction capabilities.

Abstract

Quadrupedal animals can perform agile and playful tasks while interacting with real-world objects. For instance, a trained dog can track and catch a flying frisbee before it touches the ground, while a cat left alone at home may leap to grasp the door handle. Successfully grasping an object during high-dynamic locomotion requires highly precise perception and control. However, due to hardware limitations, agility and precision are usually a trade-off in robotics problems. In this work, we employ a perception-control decoupled system based on Reinforcement Learning (RL), aiming to explore the level of precision a quadrupedal robot can achieve while interacting with objects during high-dynamic locomotion. Our experiments show that our quadrupedal robot, mounted with a passive gripper in front of the robot's chassis, can perform both tracking and catching tasks similar to a real trained dog. The robot can follow a mid-air ball moving at speeds of up to 3m/s and it can leap and successfully catch a small object hanging above it at a height of 1.05m in simulation and 0.8m in the real world.

Playful DoggyBot: Learning Agile and Precise Quadrupedal Locomotion

TL;DR

Playful DoggyBot addresses the challenge of achieving both agility and precision in quadrupedal manipulation by using a perception-control decoupled RL framework and a memory-equipped policy to track and catch small, fast-moving objects during high-dynamic locomotion. The approach combines dual reward terms for agility and precision, a curriculum over target heights, and a PD-based real-world deployment pipeline, trained in simulation and transferred to a real Unitree Go2 with a passive mouth-like gripper. Results show the robot can track targets up to speeds of and catch objects at heights up to in real or simulated environments, with GRU and Transformer backbones outperforming MLP in most settings. The work highlights a sim-to-real gap driven by perception latency and sensing noise, and points to future integration with vision-language models to expand dynamic object interaction capabilities.

Abstract

Quadrupedal animals can perform agile and playful tasks while interacting with real-world objects. For instance, a trained dog can track and catch a flying frisbee before it touches the ground, while a cat left alone at home may leap to grasp the door handle. Successfully grasping an object during high-dynamic locomotion requires highly precise perception and control. However, due to hardware limitations, agility and precision are usually a trade-off in robotics problems. In this work, we employ a perception-control decoupled system based on Reinforcement Learning (RL), aiming to explore the level of precision a quadrupedal robot can achieve while interacting with objects during high-dynamic locomotion. Our experiments show that our quadrupedal robot, mounted with a passive gripper in front of the robot's chassis, can perform both tracking and catching tasks similar to a real trained dog. The robot can follow a mid-air ball moving at speeds of up to 3m/s and it can leap and successfully catch a small object hanging above it at a height of 1.05m in simulation and 0.8m in the real world.
Paper Structure (20 sections, 4 equations, 5 figures, 1 table)

This paper contains 20 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Playful DoggyBot. We present a system to explore the agile and precise movements of quadrupedal robots. A robot dog mounted with a mouth-like gripper can finish the challenging task of leaping up to catch a small target object. Code and videos are available on the Project Webpage: https://playful-doggybot.github.io/.
  • Figure 2: System Framework (Pipeline). We use the policy network trained in the simulation to map the observation input, which includes proprioception and the goal position coordinates computed using the object detector, into goal joint angles. Then the PD controller computes the motor torques with respect to the goal joint angles, current joint angles, and joint velocities, and applies them to the real robot.
  • Figure 3: Hardware Setup. Maintaining consistency in the collision shape of the front gripper between simulation and reality helps the robot learn to correctly trigger contact with the ball using the gripper at the appropriate position.
  • Figure 4: Comparisons. (a) We examine the success rate of GRU for catching the target object while introducing uniform noise to the goal position at various heights. (b) We analyze the success rate of the training range for the number of different target heights during training across three different backbone architectures. (c) and (d) We evaluate the success rate based on whether the input includes the absolute target height, again comparing the performance of the three backbone models.
  • Figure 5: Curriculum Learning. When broadening the height range of objects that robots can grasp through curriculum settings, we observe that the success rate for a specific height may decline due to a reduced number of episodes associated with that height.