Table of Contents
Fetching ...

Multi-Objective Algorithms for Learning Open-Ended Robotic Problems

Martin Robert, Simon Brodeur, Francois Ferland

TL;DR

This work introduces a robust framework for training quadrupedal robots, promising significant advancements in robotic locomotion and open-ended robotic problems.

Abstract

Quadrupedal locomotion is a complex, open-ended problem vital to expanding autonomous vehicle reach. Traditional reinforcement learning approaches often fall short due to training instability and sample inefficiency. We propose a novel method leveraging multi-objective evolutionary algorithms as an automatic curriculum learning mechanism, which we named Multi-Objective Learning (MOL). Our approach significantly enhances the learning process by projecting velocity commands into an objective space and optimizing for both performance and diversity. Tested within the MuJoCo physics simulator, our method demonstrates superior stability and adaptability compared to baseline approaches. As such, it achieved 19\% and 44\% fewer errors against our best baseline algorithm in difficult scenarios based on a uniform and tailored evaluation respectively. This work introduces a robust framework for training quadrupedal robots, promising significant advancements in robotic locomotion and open-ended robotic problems.

Multi-Objective Algorithms for Learning Open-Ended Robotic Problems

TL;DR

This work introduces a robust framework for training quadrupedal robots, promising significant advancements in robotic locomotion and open-ended robotic problems.

Abstract

Quadrupedal locomotion is a complex, open-ended problem vital to expanding autonomous vehicle reach. Traditional reinforcement learning approaches often fall short due to training instability and sample inefficiency. We propose a novel method leveraging multi-objective evolutionary algorithms as an automatic curriculum learning mechanism, which we named Multi-Objective Learning (MOL). Our approach significantly enhances the learning process by projecting velocity commands into an objective space and optimizing for both performance and diversity. Tested within the MuJoCo physics simulator, our method demonstrates superior stability and adaptability compared to baseline approaches. As such, it achieved 19\% and 44\% fewer errors against our best baseline algorithm in difficult scenarios based on a uniform and tailored evaluation respectively. This work introduces a robust framework for training quadrupedal robots, promising significant advancements in robotic locomotion and open-ended robotic problems.

Paper Structure

This paper contains 11 sections, 3 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: Render of the simulated 12 degrees of freedom quadrupedal robot used in the MuJuCo environment.
  • Figure 2: This is a representation of the objective space in two dimensions, illustrating locomotion performance in the x and y directions. Selecting non-dominated points when there is perfect performance in at least one direction results in four points (black square) but an infinity of points when they are L2 normalized (blue circle). We use simplex (yellow triangle) vertices as objectives to keep the latter and avoid the curse of dimensionality.
  • Figure 3: Comparison of the approaches on different operational constraints based on their mean performance distance to a set of desired commands. Lower is better; the cap bar represents one standard deviation from the trial's mean. Results are from ten trials for each approach in each scenario.
  • Figure 4: Comparison of training mean reward for all operational constraints. The mean reward was calculated from every 400 steps of the 50 sampled commands. Error bar represents one standard deviation from the trials mean. Constant progression and lower error bar indicate better training stability. Results are from ten trials for each approach in each scenario.
  • Figure 5: a) Mean density for all eight quadrants of the task space per approach per operational constraints. Higher is better, and the cap bar represents one standard deviation from the trial's mean. Results are from ten trials for each approach in each scenario. b) Ablation study of different schedules for the mutation probability (P-08, P-1) and strength (S-05, S-10, S-15) using NSGA-II. Results are from the nominal scenario and based on the set of test commands. Lower is better, and the cap bar is one standard deviation from the mean of five trials.
  • ...and 1 more figures