Table of Contents
Fetching ...

A Real-World Quadrupedal Locomotion Benchmark for Offline Reinforcement Learning

Hongyin Zhang, Shuyu Yang, Donglin Wang

TL;DR

This benchmark can provide a fertile ground for future application-oriented ORL research and show that the best-performing ORL algorithms can achieve competitive performance compared with the online RL, and even surpass it in some tasks.

Abstract

Online reinforcement learning (RL) methods are often data-inefficient or unreliable, making them difficult to train on real robotic hardware, especially quadruped robots. Learning robotic tasks from pre-collected data is a promising direction. Meanwhile, agile and stable legged robotic locomotion remains an open question in their general form. Offline reinforcement learning (ORL) has the potential to make breakthroughs in this challenging field, but its current bottleneck lies in the lack of diverse datasets for challenging realistic tasks. To facilitate the development of ORL, we benchmarked 11 ORL algorithms in the realistic quadrupedal locomotion dataset. Such dataset is collected by the classic model predictive control (MPC) method, rather than the model-free online RL method commonly used by previous benchmarks. Extensive experimental results show that the best-performing ORL algorithms can achieve competitive performance compared with the model-free RL, and even surpass it in some tasks. However, there is still a gap between the learning-based methods and MPC, especially in terms of stability and rapid adaptation. Our proposed benchmark will serve as a development platform for testing and evaluating the performance of ORL algorithms in real-world legged locomotion tasks.

A Real-World Quadrupedal Locomotion Benchmark for Offline Reinforcement Learning

TL;DR

This benchmark can provide a fertile ground for future application-oriented ORL research and show that the best-performing ORL algorithms can achieve competitive performance compared with the online RL, and even surpass it in some tasks.

Abstract

Online reinforcement learning (RL) methods are often data-inefficient or unreliable, making them difficult to train on real robotic hardware, especially quadruped robots. Learning robotic tasks from pre-collected data is a promising direction. Meanwhile, agile and stable legged robotic locomotion remains an open question in their general form. Offline reinforcement learning (ORL) has the potential to make breakthroughs in this challenging field, but its current bottleneck lies in the lack of diverse datasets for challenging realistic tasks. To facilitate the development of ORL, we benchmarked 11 ORL algorithms in the realistic quadrupedal locomotion dataset. Such dataset is collected by the classic model predictive control (MPC) method, rather than the model-free online RL method commonly used by previous benchmarks. Extensive experimental results show that the best-performing ORL algorithms can achieve competitive performance compared with the model-free RL, and even surpass it in some tasks. However, there is still a gap between the learning-based methods and MPC, especially in terms of stability and rapid adaptation. Our proposed benchmark will serve as a development platform for testing and evaluating the performance of ORL algorithms in real-world legged locomotion tasks.
Paper Structure (15 sections, 6 figures)

This paper contains 15 sections, 6 figures.

Figures (6)

  • Figure 1: Data distribution on the Return, COT and COV metrics. The yellower the color of the discrete point, the larger the COV value, otherwise vice versa. The number indicates task number $MN$.
  • Figure 2: Performance comparison of ORL algorithms on Return and COT metrics across all realistic tasks. The black vertical line indicates one standard deviation. Three random seeds are used.
  • Figure 3: Instability of the ORL algorithm's return on different random seeds for all tasks. The black dashed and solid lines indicate the COV of the BC and MPC algorithms, respectively.
  • Figure 4: Fluctuations in the return and energy consumption of CQL algorithm across all real-world environments and tasks. The horizontal axis number indicates the task number $MN$. The dots and vertical bars represent the mean and one standard deviation, respectively. Three random seeds are used.
  • Figure 5: Performance comparison of different tasks (left) in the Indoor floor and performance comparison of different environments (right) with locomotion command (0.4 m/s, 0 rad/s). "Online-RL" indicates the DreamWaQ algorithm. "Offline-RL-Best" indicates the ORL algorithm with the best performance under the corresponding task (environment). They are PLAS, CQL, AWAC, PLAS, AWAC and BCQ algorithms in different tasks (left), and AWAC, CQL, PLAS, CQL and BEAR algorithms in different environments (right). The black vertical line indicates one standard deviation. Three random seeds are used.
  • ...and 1 more figures