Table of Contents
Fetching ...

DTPPO: Dual-Transformer Encoder-based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments

Anning Wei, Jintao Liang, Kaiyuan Lin, Ziyue Li, Rui Zhao

TL;DR

The results confirm DTPPO’s effectiveness as a robust solution for multi-UAV navigation in both known and unseen scenarios and demonstrate that DTPPO outperforms current MADRL methods in terms of transferability, obstacle avoidance, and navigation efficiency across environments with varying obstacle densities.

Abstract

Existing multi-agent deep reinforcement learning (MADRL) methods for multi-UAV navigation face challenges in generalization, particularly when applied to unseen complex environments. To address these limitations, we propose a Dual-Transformer Encoder-based Proximal Policy Optimization (DTPPO) method. DTPPO enhances multi-UAV collaboration through a Spatial Transformer, which models inter-agent dynamics, and a Temporal Transformer, which captures temporal dependencies to improve generalization across diverse environments. This architecture allows UAVs to navigate new, unseen environments without retraining. Extensive simulations demonstrate that DTPPO outperforms current MADRL methods in terms of transferability, obstacle avoidance, and navigation efficiency across environments with varying obstacle densities. The results confirm DTPPO's effectiveness as a robust solution for multi-UAV navigation in both known and unseen scenarios.

DTPPO: Dual-Transformer Encoder-based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments

TL;DR

The results confirm DTPPO’s effectiveness as a robust solution for multi-UAV navigation in both known and unseen scenarios and demonstrate that DTPPO outperforms current MADRL methods in terms of transferability, obstacle avoidance, and navigation efficiency across environments with varying obstacle densities.

Abstract

Existing multi-agent deep reinforcement learning (MADRL) methods for multi-UAV navigation face challenges in generalization, particularly when applied to unseen complex environments. To address these limitations, we propose a Dual-Transformer Encoder-based Proximal Policy Optimization (DTPPO) method. DTPPO enhances multi-UAV collaboration through a Spatial Transformer, which models inter-agent dynamics, and a Temporal Transformer, which captures temporal dependencies to improve generalization across diverse environments. This architecture allows UAVs to navigate new, unseen environments without retraining. Extensive simulations demonstrate that DTPPO outperforms current MADRL methods in terms of transferability, obstacle avoidance, and navigation efficiency across environments with varying obstacle densities. The results confirm DTPPO's effectiveness as a robust solution for multi-UAV navigation in both known and unseen scenarios.

Paper Structure

This paper contains 22 sections, 11 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure S1: A schematic illustration of zero-shot transfer to a previously unseen environment (Scene-III) after training on known environments (Scene-I and Scene-II).
  • Figure S2: Overview of DTPPO.
  • Figure S3: The Navigation algorithm will be tested in the three types of environments: a square column obstacle, a cylindrical obstacle, and mixed obstacles. Different obstacle densities can be set for training.
  • Figure S4: Transfer reward during training.
  • Figure S5: Ablation study on different components in DTPPO.
  • ...and 2 more figures