Table of Contents
Fetching ...

Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity

Jinghao Xin, Jinwoo Kim, Zhi Li, Ning Li

TL;DR

Color tackles DRL inefficiencies in local path planning by coupling a partially decoupled ASL training framework with a lightweight, vectorizable Sparrow simulator. ASL leverages Vectorized Data Collection (VDC), a Vectorized Epsilon-Greedy Exploration Mechanism (VEM), and a Time Feedback Mechanism (TFM) to boost training time and sample efficiency, while Sparrow provides conversion-free data flow, simplified kinematics, and vectorized diversity to improve generalization. Their integration enables training a real-world LPP planner within about one hour of simulation, achieving strong Sim2Real and Task2Task generalization (e.g., 38/42 real-world successes) and providing a scalable, environmentally diverse DRL pipeline for mobile robotics. This work presents a practical, high-throughput approach to deploying DRL-based LPP systems with significant potential for real-world robot navigation and deployment.

Abstract

Deep Reinforcement Learning (DRL) has exhibited efficacy in resolving the Local Path Planning (LPP) problem. However, such application in the real world is immensely limited due to the deficient training efficiency and generalization capability of DRL. To alleviate these two issues, a solution named Color is proposed, which consists of an Actor-Sharer-Learner (ASL) training framework and a mobile robot-oriented simulator Sparrow. Specifically, the ASL intends to improve the training efficiency of DRL algorithms. It employs a Vectorized Data Collection (VDC) mode to expedite data acquisition, decouples the data collection from model optimization by multithreading, and partially connects the two procedures by harnessing a Time Feedback Mechanism (TFM) to evade data underuse or overuse. Meanwhile, the Sparrow simulator utilizes a 2D grid-based world, simplified kinematics, and conversion-free data flow to achieve a lightweight design. The lightness facilitates vectorized diversity, allowing diversified simulation setups across extensive copies of the vectorized environments, resulting in a notable enhancement in the generalization capability of the DRL algorithm being trained. Comprehensive experiments, comprising 57 DRL benchmark environments, 32 simulated and 36 real-world LPP scenarios, have been conducted to corroborate the superiority of our method in terms of efficiency and generalization. The code and the video of this paper are accessible at https://github.com/XinJingHao/Color.

Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity

TL;DR

Color tackles DRL inefficiencies in local path planning by coupling a partially decoupled ASL training framework with a lightweight, vectorizable Sparrow simulator. ASL leverages Vectorized Data Collection (VDC), a Vectorized Epsilon-Greedy Exploration Mechanism (VEM), and a Time Feedback Mechanism (TFM) to boost training time and sample efficiency, while Sparrow provides conversion-free data flow, simplified kinematics, and vectorized diversity to improve generalization. Their integration enables training a real-world LPP planner within about one hour of simulation, achieving strong Sim2Real and Task2Task generalization (e.g., 38/42 real-world successes) and providing a scalable, environmentally diverse DRL pipeline for mobile robotics. This work presents a practical, high-throughput approach to deploying DRL-based LPP systems with significant potential for real-world robot navigation and deployment.

Abstract

Deep Reinforcement Learning (DRL) has exhibited efficacy in resolving the Local Path Planning (LPP) problem. However, such application in the real world is immensely limited due to the deficient training efficiency and generalization capability of DRL. To alleviate these two issues, a solution named Color is proposed, which consists of an Actor-Sharer-Learner (ASL) training framework and a mobile robot-oriented simulator Sparrow. Specifically, the ASL intends to improve the training efficiency of DRL algorithms. It employs a Vectorized Data Collection (VDC) mode to expedite data acquisition, decouples the data collection from model optimization by multithreading, and partially connects the two procedures by harnessing a Time Feedback Mechanism (TFM) to evade data underuse or overuse. Meanwhile, the Sparrow simulator utilizes a 2D grid-based world, simplified kinematics, and conversion-free data flow to achieve a lightweight design. The lightness facilitates vectorized diversity, allowing diversified simulation setups across extensive copies of the vectorized environments, resulting in a notable enhancement in the generalization capability of the DRL algorithm being trained. Comprehensive experiments, comprising 57 DRL benchmark environments, 32 simulated and 36 real-world LPP scenarios, have been conducted to corroborate the superiority of our method in terms of efficiency and generalization. The code and the video of this paper are accessible at https://github.com/XinJingHao/Color.
Paper Structure (35 sections, 11 equations, 18 figures, 10 tables, 2 algorithms)

This paper contains 35 sections, 11 equations, 18 figures, 10 tables, 2 algorithms.

Figures (18)

  • Figure 1: An overview of Color. The left part is the mobile robot-oriented simulator Sparrow, where simulation parameters (such as control interval, control delay, fraction, inertia, velocity range, and sensor noise) and training maps can be readily diversified via vectorized environments. The right part illustrates our efficient DRL training framework, ASL. The seamless integration between Sparrow and ASL is achieved through their interdependent vectorized environments. Notably, Color exhibits the capability to rapidly train a DRL-based local path planner with high generalization capacity.
  • Figure 2: Vectorized $\epsilon$-greedy Exploration Mechanism. Here, the horizontal axis is the index of the vectorized environments, and $N$ is the total number of vectorized environments.
  • Figure 3: VEM-based exploration. Here, the raw action is generated by the DRL model, and the VEM action will be applied to the N distinct copies of vectorized environments.
  • Figure 4: Illustration of the coupled framework (left, representative: DQN), completely decoupled framework (middle, representative: Ape-X), and partially decoupled framework with TFM (right, representative: ASL). In this example, both the DQN and the ASL seek to maintain a $\hbox{TPS}$ of 8. However, the cycle (the collection of 4 transitions + the training of 32 transitions) in DQN is more time-consuming due to its step-by-step collection and the alternation between collection and training. Conversely, the ASL demonstrates computational efficiency attributable to its vectorized interaction and its ability to perform collection and training concurrently. In contrast, the Ape-X fully decouples the collection and training processes, resulting in an uncontrolled $\hbox{TPS}$ and risking its training stability and sample efficiency.
  • Figure 5: Schematic of the Actor-Sharer-Learner training framework
  • ...and 13 more figures