Table of Contents
Fetching ...

An Open-Loop Baseline for Reinforcement Learning Locomotion Tasks

Antonin Raffin, Olivier Sigaud, Jens Kober, Alin Albu-Schäffer, João Silvério, Freek Stulp

TL;DR

The paper tackles the reproducibility and complexity challenges of deep reinforcement learning for locomotion by proposing an open-loop, model-free baseline based on phase-locked nonlinear oscillators to generate joint trajectories. Parameters of the oscillators are optimized with CMA-ES, and a PD controller maps desired positions to torques, enabling fast, parameter-efficient control without state feedback. Empirical results on MuJoCo locomotion benchmarks show the open-loop baseline is competitive with simple DRL baselines and robust to sensor perturbations, while enabling sim-to-real transfer in an elastic quadruped where RL struggles. The work highlights the value of incorporating domain knowledge to reduce search space and complexity, and argues for hybrid approaches that combine feedforward simplicity with feedback adaptability for real-world robotics.

Abstract

In search of a simple baseline for Deep Reinforcement Learning in locomotion tasks, we propose a model-free open-loop strategy. By leveraging prior knowledge and the elegance of simple oscillators to generate periodic joint motions, it achieves respectable performance in five different locomotion environments, with a number of tunable parameters that is a tiny fraction of the thousands typically required by DRL algorithms. We conduct two additional experiments using open-loop oscillators to identify current shortcomings of these algorithms. Our results show that, compared to the baseline, DRL is more prone to performance degradation when exposed to sensor noise or failure. Furthermore, we demonstrate a successful transfer from simulation to reality using an elastic quadruped, where RL fails without randomization or reward engineering. Overall, the proposed baseline and associated experiments highlight the existing limitations of DRL for robotic applications, provide insights on how to address them, and encourage reflection on the costs of complexity and generality.

An Open-Loop Baseline for Reinforcement Learning Locomotion Tasks

TL;DR

The paper tackles the reproducibility and complexity challenges of deep reinforcement learning for locomotion by proposing an open-loop, model-free baseline based on phase-locked nonlinear oscillators to generate joint trajectories. Parameters of the oscillators are optimized with CMA-ES, and a PD controller maps desired positions to torques, enabling fast, parameter-efficient control without state feedback. Empirical results on MuJoCo locomotion benchmarks show the open-loop baseline is competitive with simple DRL baselines and robust to sensor perturbations, while enabling sim-to-real transfer in an elastic quadruped where RL struggles. The work highlights the value of incorporating domain knowledge to reduce search space and complexity, and argues for hybrid approaches that combine feedforward simplicity with feedback adaptability for real-world robotics.

Abstract

In search of a simple baseline for Deep Reinforcement Learning in locomotion tasks, we propose a model-free open-loop strategy. By leveraging prior knowledge and the elegance of simple oscillators to generate periodic joint motions, it achieves respectable performance in five different locomotion environments, with a number of tunable parameters that is a tiny fraction of the thousands typically required by DRL algorithms. We conduct two additional experiments using open-loop oscillators to identify current shortcomings of these algorithms. Our results show that, compared to the baseline, DRL is more prone to performance degradation when exposed to sensor noise or failure. Furthermore, we demonstrate a successful transfer from simulation to reality using an elastic quadruped, where RL fails without randomization or reward engineering. Overall, the proposed baseline and associated experiments highlight the existing limitations of DRL for robotic applications, provide insights on how to address them, and encourage reflection on the costs of complexity and generality.
Paper Structure (15 sections, 2 equations, 8 figures, 6 tables)

This paper contains 15 sections, 2 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Performance profiles on the MuJoCo locomotion tasks (left) and probability of improvements of the open-loop approach over baselines, with a 95% confidence interval.
  • Figure 2: Metrics results on MuJoCo locomotion tasks using median and interquartile mean (IQM), with a 95% confidence interval.
  • Figure 3: Parameter efficiency of the different algorithms. Results are presented with a 95% confidence interval and score are normalized with respect to the open-loop baseline.
  • Figure 4: Robustness to sensor noise (with varying intensities), failures of Type I (all zeros) and II (constant large value) and external disturbances. All results are presented with a 95% confidence interval and score are normalized with respect to the open-loop baseline.
  • Figure 5: Robotic quadruped with elastic actuators in simulation (left) and real hardware (right)
  • ...and 3 more figures