Table of Contents
Fetching ...

Two-Stage Learning of Highly Dynamic Motions with Rigid and Articulated Soft Quadrupeds

Francecso Vezzi, Jiatao Ding, Antonin Raffin, Jens Kober, Cosimo Della Santina

TL;DR

This study relies on a simple yet effective two-stage learning framework to generate dynamic motions for quadrupedal robots, which proves particularly effective for articulated soft quadrupeds, whose inherent compliance and adaptability make them ideal for dynamic tasks but also introduce unique control challenges.

Abstract

Controlled execution of dynamic motions in quadrupedal robots, especially those with articulated soft bodies, presents a unique set of challenges that traditional methods struggle to address efficiently. In this study, we tackle these issues by relying on a simple yet effective two-stage learning framework to generate dynamic motions for quadrupedal robots. First, a gradient-free evolution strategy is employed to discover simply represented control policies, eliminating the need for a predefined reference motion. Then, we refine these policies using deep reinforcement learning. Our approach enables the acquisition of complex motions like pronking and back-flipping, effectively from scratch. Additionally, our method simplifies the traditionally labour-intensive task of reward shaping, boosting the efficiency of the learning process. Importantly, our framework proves particularly effective for articulated soft quadrupeds, whose inherent compliance and adaptability make them ideal for dynamic tasks but also introduce unique control challenges.

Two-Stage Learning of Highly Dynamic Motions with Rigid and Articulated Soft Quadrupeds

TL;DR

This study relies on a simple yet effective two-stage learning framework to generate dynamic motions for quadrupedal robots, which proves particularly effective for articulated soft quadrupeds, whose inherent compliance and adaptability make them ideal for dynamic tasks but also introduce unique control challenges.

Abstract

Controlled execution of dynamic motions in quadrupedal robots, especially those with articulated soft bodies, presents a unique set of challenges that traditional methods struggle to address efficiently. In this study, we tackle these issues by relying on a simple yet effective two-stage learning framework to generate dynamic motions for quadrupedal robots. First, a gradient-free evolution strategy is employed to discover simply represented control policies, eliminating the need for a predefined reference motion. Then, we refine these policies using deep reinforcement learning. Our approach enables the acquisition of complex motions like pronking and back-flipping, effectively from scratch. Additionally, our method simplifies the traditionally labour-intensive task of reward shaping, boosting the efficiency of the learning process. Importantly, our framework proves particularly effective for articulated soft quadrupeds, whose inherent compliance and adaptability make them ideal for dynamic tasks but also introduce unique control challenges.
Paper Structure (29 sections, 17 equations, 7 figures, 4 tables)

This paper contains 29 sections, 17 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Two-stage learning procedure for the acrobatic motion with a flight phase. Without defining reference motions, the (rigid and soft) quadrupedal robot realizes (a) jumping in place and jumping forward, (b) pronking and (c) back-flip. 'ARS' and 'ARS+DRL' separately represent the linear policies generated by first-stage ES and the refined policy after the second-stage retraining.
  • Figure 2: Examples of highly dynamic motions that the robot could learn using the proposed strategy. Red curves plot the CoM movements where the arrows point to the movement directions.
  • Figure 3: Quadrupedal robot Go1 (a) and its PEA arrangement (b). In the homing pose, we have initial height $z_0=0.32$m. We call $q_1,q_4$ the calf angles, $q_2,q_5$ the thigh angles, and $q_3,q_6$ the hip angles.
  • Figure 4: Reward profiles for the in-place jumping achieved by the ARS algorithm when using different policy representations.
  • Figure 5: Jumping performance comparison. In the second column, a high reward means a better performance, while a low contact force means a compliant landing motion. Note the contact forces are divided by 1000N. In the third column, the 'height' and 'length' separately denote the maximal jumping height and jumping distance. In the fourth column, the learned CoM trajectories (under 'ARS+DRL' policy) for a rigid robot and a soft robot are compared. 'w/o spring' denotes the rigid case without springs engaged.
  • ...and 2 more figures