Table of Contents
Fetching ...

Improving Generalization in Aerial and Terrestrial Mobile Robots Control Through Delayed Policy Learning

Ricardo B. Grando, Raul Steinmetz, Victor A. Kich, Alisson H. Kolling, Pablo M. Furik, Junior C. de Jesus, Bruna V. Guterres, Daniel T. Gamarra, Rodrigo S. Guerra, Paulo L. J. Drews-Jr

TL;DR

The paper addresses generalization gaps in deep reinforcement learning for autonomous aerial and terrestrial navigation by evaluating Delayed Policy Updates (DPU) within the TD3 framework. It systematically varies the update delay parameter $\eta$ and demonstrates that higher delays can accelerate early learning and markedly improve performance in unseen environments, as evidenced by aerial unseen tasks reaching up to $98.67\%$ success with $\eta=8$ and terrestrial unseen tasks around $85\%$ with the same delay. The results support adopting larger DPUs to mitigate overfitting and promote robust continuous-control policies in diverse scenarios, using ROS/Gazebo simulations with LIDAR sensing and obstacle-rich environments. These findings offer practical guidance for designing generalizable mobile-robot controllers and motivate further exploration of DPU interactions with other regularization and augmentation methods.

Abstract

Deep Reinforcement Learning (DRL) has emerged as a promising approach to enhancing motion control and decision-making through a wide range of robotic applications. While prior research has demonstrated the efficacy of DRL algorithms in facilitating autonomous mapless navigation for aerial and terrestrial mobile robots, these methods often grapple with poor generalization when faced with unknown tasks and environments. This paper explores the impact of the Delayed Policy Updates (DPU) technique on fostering generalization to new situations, and bolstering the overall performance of agents. Our analysis of DPU in aerial and terrestrial mobile robots reveals that this technique significantly curtails the lack of generalization and accelerates the learning process for agents, enhancing their efficiency across diverse tasks and unknown scenarios.

Improving Generalization in Aerial and Terrestrial Mobile Robots Control Through Delayed Policy Learning

TL;DR

The paper addresses generalization gaps in deep reinforcement learning for autonomous aerial and terrestrial navigation by evaluating Delayed Policy Updates (DPU) within the TD3 framework. It systematically varies the update delay parameter and demonstrates that higher delays can accelerate early learning and markedly improve performance in unseen environments, as evidenced by aerial unseen tasks reaching up to success with and terrestrial unseen tasks around with the same delay. The results support adopting larger DPUs to mitigate overfitting and promote robust continuous-control policies in diverse scenarios, using ROS/Gazebo simulations with LIDAR sensing and obstacle-rich environments. These findings offer practical guidance for designing generalizable mobile-robot controllers and motivate further exploration of DPU interactions with other regularization and augmentation methods.

Abstract

Deep Reinforcement Learning (DRL) has emerged as a promising approach to enhancing motion control and decision-making through a wide range of robotic applications. While prior research has demonstrated the efficacy of DRL algorithms in facilitating autonomous mapless navigation for aerial and terrestrial mobile robots, these methods often grapple with poor generalization when faced with unknown tasks and environments. This paper explores the impact of the Delayed Policy Updates (DPU) technique on fostering generalization to new situations, and bolstering the overall performance of agents. Our analysis of DPU in aerial and terrestrial mobile robots reveals that this technique significantly curtails the lack of generalization and accelerates the learning process for agents, enhancing their efficiency across diverse tasks and unknown scenarios.
Paper Structure (9 sections, 2 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 9 sections, 2 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Architecture of the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm and its mechanisms.
  • Figure 2: Training and evaluation scenarios for aerial mobile robots, respectively.
  • Figure 3: Training and evaluation scenarios for terrestrial mobile robots, respectively.
  • Figure 4: Reward moving average of the aerial mobile robot over 500 episodes of training.
  • Figure 5: Reward moving average of the terrestrial mobile robot over 5000 episodes of training.
  • ...and 2 more figures