Learning Rapid Turning, Aerial Reorientation, and Balancing using Manipulator as a Tail

Insung Yang; Jemin Hwangbo

Learning Rapid Turning, Aerial Reorientation, and Balancing using Manipulator as a Tail

Insung Yang, Jemin Hwangbo

TL;DR

This work tackles the challenge of augmenting quadruped locomotion by using a 6-DoF manipulator as a multifunctional tail, addressing drawbacks of dedicated tails such as added weight and cost. A PPO-based deep reinforcement learning controller with Actor and Critic networks governs a manipulator-mounted quadruped in RAISIM, coordinating rapid turning, aerial reorientation, and balancing through task-specific observations, actions, and composite reward structures. The key contributions include a detailed RL formulation for tail-enabled locomotion, a staged rapid-turning curriculum, and empirical evidence that the manipulator improves turning sharpness, aerial agility, and robustness to external disturbances. The findings suggest that integrating a manipulator as a tail can significantly enhance quadruped performance in the tested simulations, offering a path toward more capable and versatile legged robots, albeit requiring real-world validation and exploration of further applications.

Abstract

In this research, we investigated the innovative use of a manipulator as a tail in quadruped robots to augment their physical capabilities. Previous studies have primarily focused on enhancing various abilities by attaching robotic tails that function solely as tails on quadruped robots. While these tails improve the performance of the robots, they come with several disadvantages, such as increased overall weight and higher costs. To mitigate these limitations, we propose the use of a 6-DoF manipulator as a tail, allowing it to serve both as a tail and as a manipulator. To control this highly complex robot, we developed a controller based on reinforcement learning for the robot equipped with the manipulator. Our experimental results demonstrate that robots equipped with a manipulator outperform those without a manipulator in tasks such as rapid turning, aerial reorientation, and balancing. These results indicate that the manipulator can improve the agility and stability of quadruped robots, similar to a tail, in addition to its manipulation capabilities.

Learning Rapid Turning, Aerial Reorientation, and Balancing using Manipulator as a Tail

TL;DR

Abstract

Paper Structure (25 sections, 5 equations, 8 figures, 3 tables)

This paper contains 25 sections, 5 equations, 8 figures, 3 tables.

INTRODUCTION
METHOD
Overview
Base
Observation
Action
Reward
Rapid Turning
Stage 1 (Yaw Adjustment in Standstill)
Stage2 (Rapid turning during running)
Reward
Aerial Reorientation and Safe Landing
Inertia of tail
Termination
Reward
...and 10 more sections

Figures (8)

Figure 1: A quadruped robot, Mini Cheetah, equipped with a WidowX250S 6-DoF manipulator, is executing a rapid $135^{\circ}$ turn while running at 4.5 m/s. It learns to utilize the manipulator as a tail for sharp turns via reinforcement learning.
Figure 2: In the proposed reinforcement learning pipeline, two neural networks, namely the Actor and the Critic, are utilized. The Actor network generates the actions, while the Critic network computes the value function, which is essential for updating the Actor network. Both networks are structured as Multilayer Perceptron (MLP) and receive observations as inputs. These observations encompass two types: command, which is specified by the task, and robot state, which provides information about the robot's current state. The actions generated by the Actor are conveyed to the environment through a Proportional-Derivative (PD) controller, and the resulting reward, based on the action's effectiveness, is used to update the Actor network. This updating process employs the Proximal Policy Optimization (PPO) algorithmschulman2017proximal, refining the policy governed by the Actor network.
Figure 3: These graphs depict the comparison of trajectories for a robot with and without a manipulator executing a $135^\circ$ turn while running at 4.5 m/s. A dot is plotted every 0.05 seconds. The blue dots indicate the trajectory of the robot equipped with a manipulator following the turn command, while the red dots represent the trajectory of the robot without a manipulator after the turn command. Additionally, the light blue dots denote the ideal trajectory, which assumes an instantaneous turn upon command.
Figure 4: These graphs depict the trajectory results of a robot executing a $135^{\circ}$ turn while running at speeds of 3.0 m/s, 3.5 m/s, 4.0 m/s, and 4.5 m/s.
Figure 5: These graph show the angle between the body z-axis and the world z-axis over time from $0,\text{s}$ to $0.5,\text{s}$. The red line shows the result of the robot without a manipulator, blue line shows the robot with a WidowX250S, and the green line shows the robot with a ViperX300S.
...and 3 more figures

Learning Rapid Turning, Aerial Reorientation, and Balancing using Manipulator as a Tail

TL;DR

Abstract

Learning Rapid Turning, Aerial Reorientation, and Balancing using Manipulator as a Tail

Authors

TL;DR

Abstract

Table of Contents

Figures (8)