Table of Contents
Fetching ...

Multi-Agent Reinforcement Learning-Based UAV Pathfinding for Obstacle Avoidance in Stochastic Environment

Qizhen Wu, Kexin Liu, Lei Chen, Jinhu Lü

TL;DR

This work proposes a novel centralized training with decentralized execution method based on multi-agent reinforcement learning, which is improved based on the idea of model predictive control and conducts multi-step value convergence in multi-agent reinforcement learning to enhance the training efficiency.

Abstract

Traditional methods plan feasible paths for multiple agents in the stochastic environment. However, the methods' iterations with the changes in the environment result in computation complexities, especially for the decentralized agents without a centralized planner. Although reinforcement learning provides a plausible solution because of the generalization for different environments, it struggles with enormous agent-environment interactions in training. Here, we propose a novel centralized training with decentralized execution method based on multi-agent reinforcement learning, which is improved based on the idea of model predictive control. In our approach, agents communicate only with the centralized planner to make decentralized decisions online in the stochastic environment. Furthermore, considering the communication constraint with the centralized planner, each agent plans feasible paths through the extended observation, which combines information on neighboring agents based on the distance-weighted mean field approach. Inspired by the rolling optimization approach of model predictive control, we conduct multi-step value convergence in multi-agent reinforcement learning to enhance the training efficiency, which reduces the expensive interactions in convergence. Experiment results in both comparison, ablation, and real-robot studies validate the effectiveness and generalization performance of our method.

Multi-Agent Reinforcement Learning-Based UAV Pathfinding for Obstacle Avoidance in Stochastic Environment

TL;DR

This work proposes a novel centralized training with decentralized execution method based on multi-agent reinforcement learning, which is improved based on the idea of model predictive control and conducts multi-step value convergence in multi-agent reinforcement learning to enhance the training efficiency.

Abstract

Traditional methods plan feasible paths for multiple agents in the stochastic environment. However, the methods' iterations with the changes in the environment result in computation complexities, especially for the decentralized agents without a centralized planner. Although reinforcement learning provides a plausible solution because of the generalization for different environments, it struggles with enormous agent-environment interactions in training. Here, we propose a novel centralized training with decentralized execution method based on multi-agent reinforcement learning, which is improved based on the idea of model predictive control. In our approach, agents communicate only with the centralized planner to make decentralized decisions online in the stochastic environment. Furthermore, considering the communication constraint with the centralized planner, each agent plans feasible paths through the extended observation, which combines information on neighboring agents based on the distance-weighted mean field approach. Inspired by the rolling optimization approach of model predictive control, we conduct multi-step value convergence in multi-agent reinforcement learning to enhance the training efficiency, which reduces the expensive interactions in convergence. Experiment results in both comparison, ablation, and real-robot studies validate the effectiveness and generalization performance of our method.
Paper Structure (19 sections, 1 theorem, 29 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 19 sections, 1 theorem, 29 equations, 8 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

When considering agent $i$, the decentralized policy function $\pi^i(z^{i}_{\text{ext}})$ can be approximated by $\tilde{\pi}^i(z^i,\tilde{z}^i)$.

Figures (8)

  • Figure 1: Multi--UAV flight in an uncertain scenario. The blue and red spheres indicate target and hazardous areas, respectively. We consider the multi--UAV obstacle avoidance problem in the scenario where hazardous areas’ locations and numbers are randomly changing at regular intervals. UAVs perform real--time path planning based on the trained policies. In experiments, we maneuver Crazyflies through the ground control center with motion capture from Optitrack.
  • Figure 2: An overview of our study. (a) Illustration of the objective in this paper. Each UAV should achieve its target area while avoiding hazardous areas and collisions with neighboring UAVs. (b) The decision--making and training process of the UAV for path planning. (c) Comparison between unweighted mean field and distance--weighted mean field. (d) Description of the prediction and training process for the multi--step value convergence method in MARL.
  • Figure 3: Six--UAV path planning in the simulation platform. (a)--(c) show the whole process of path planning in the simulation. (d)--(f) are the real--time trajectories for UAV path planning.
  • Figure 4: Learning curves of our method and Dec--DDPG.
  • Figure 5: Ablation study results of our methods.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1: Distance--weighted mean field approximation
  • Proof 1