Table of Contents
Fetching ...

Robust Policy Learning for Multi-UAV Collision Avoidance with Causal Feature Selection

Jiafan Zhuang, Gaofei Han, Zihao Xia, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

TL;DR

This work investigates the cause of weak generalization ability in DRL and proposes a novel causal feature selection module that can be integrated into the policy network and effectively filters out non-causal factors in representations, thereby reducing the influence of spurious correlations between non-causal factors and action predictions.

Abstract

In unseen and complex outdoor environments, collision avoidance navigation for unmanned aerial vehicle (UAV) swarms presents a challenging problem. It requires UAVs to navigate through various obstacles and complex backgrounds. Existing collision avoidance navigation methods based on deep reinforcement learning show promising performance but suffer from poor generalization abilities, resulting in performance degradation in unseen environments. To address this issue, we investigate the cause of weak generalization ability in DRL and propose a novel causal feature selection module. This module can be integrated into the policy network and effectively filters out non-causal factors in representations, thereby reducing the influence of spurious correlations between non-causal factors and action predictions. Experimental results demonstrate that our proposed method can achieve robust navigation performance and effective collision avoidance especially in scenarios with unseen backgrounds and obstacles, which significantly outperforms existing state-of-the-art algorithms.

Robust Policy Learning for Multi-UAV Collision Avoidance with Causal Feature Selection

TL;DR

This work investigates the cause of weak generalization ability in DRL and proposes a novel causal feature selection module that can be integrated into the policy network and effectively filters out non-causal factors in representations, thereby reducing the influence of spurious correlations between non-causal factors and action predictions.

Abstract

In unseen and complex outdoor environments, collision avoidance navigation for unmanned aerial vehicle (UAV) swarms presents a challenging problem. It requires UAVs to navigate through various obstacles and complex backgrounds. Existing collision avoidance navigation methods based on deep reinforcement learning show promising performance but suffer from poor generalization abilities, resulting in performance degradation in unseen environments. To address this issue, we investigate the cause of weak generalization ability in DRL and propose a novel causal feature selection module. This module can be integrated into the policy network and effectively filters out non-causal factors in representations, thereby reducing the influence of spurious correlations between non-causal factors and action predictions. Experimental results demonstrate that our proposed method can achieve robust navigation performance and effective collision avoidance especially in scenarios with unseen backgrounds and obstacles, which significantly outperforms existing state-of-the-art algorithms.
Paper Structure (27 sections, 9 equations, 8 figures, 4 tables)

This paper contains 27 sections, 9 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: The influence of obstacle shape on success rate. For the DRL model, UAV obstacles have been seen during model training while cube obstacles are unseen. During testing, the former has little influence on performance while the latter would significantly reduce the success rate of navigation.
  • Figure 2: Illustration on the influence of non-causal representation factors. During policy learning, non-causal factors would construct spurious correlations with action prediction. When testing scenarios are different from the training scenarios, these non-causal factors would bring adverse effect and result in wrong actions.
  • Figure 3: The architecture of our framework for Multi-UAV collision-avoidance. The framework follows the SAC paradigm and uses an regularized auto-encoder for visual representation extraction, which takes depth images, current velocity, and relative goal position as input and outputs flight control actions. In the actor network, we insert our design CFS module for feature selection.
  • Figure 4: Causal Feature Selection Module. This module transforms a trainable weight into a binary mask for feature selection.
  • Figure 5: Simulation scenarios for model training and testing. Specifically, playground scenario is used for model training, while grassland, snow mountain and forest scenarios are used for testing.
  • ...and 3 more figures