AC-MASAC: An Attentive Curriculum Learning Framework for Heterogeneous UAV Swarm Coordination

Wanhao Liu; Junhong Dai; Yixuan Zhang; Shengyun Yin; Panshuo Li

AC-MASAC: An Attentive Curriculum Learning Framework for Heterogeneous UAV Swarm Coordination

Wanhao Liu, Junhong Dai, Yixuan Zhang, Shengyun Yin, Panshuo Li

TL;DR

This work tackles cooperative path planning for heterogeneous UAV swarms under partial observability by framing the problem as a decentralized POMDP and introducing AC-MASAC. The method combines a heterogeneous attention-based actor-critic architecture with a structured curriculum and stage-aware replay to mitigate sparse rewards and catastrophic forgetting. Key contributions include a role-aware attention mechanism that models Leader–Follower dependencies, a curriculum learning strategy with hierarchical policy transfer and stage-proportional experience replay, and comprehensive experiments showing superior SR, FKR, and SMT against strong MARL baselines. The results demonstrate robust multi-agent coordination in dynamic environments and point to practical benefits for scalable UAV swarm control, with promising avenues for sim-to-real transfer.

Abstract

Cooperative path planning for heterogeneous UAV swarms poses significant challenges for Multi-Agent Reinforcement Learning (MARL), particularly in handling asymmetric inter-agent dependencies and addressing the risks of sparse rewards and catastrophic forgetting during training. To address these issues, this paper proposes an attentive curriculum learning framework (AC-MASAC). The framework introduces a role-aware heterogeneous attention mechanism to explicitly model asymmetric dependencies. Moreover, a structured curriculum strategy is designed, integrating hierarchical knowledge transfer and stage-proportional experience replay to address the issues of sparse rewards and catastrophic forgetting. The proposed framework is validated on a custom multi-agent simulation platform, and the results show that our method has significant advantages over other advanced methods in terms of Success Rate, Formation Keeping Rate, and Success-weighted Mission Time. The code is available at \textcolor{red}{https://github.com/Wanhao-Liu/AC-MASAC}.

AC-MASAC: An Attentive Curriculum Learning Framework for Heterogeneous UAV Swarm Coordination

TL;DR

Abstract

Paper Structure (20 sections, 11 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 11 equations, 9 figures, 3 tables, 1 algorithm.

Introduction
PROBLEM STATEMENT
AC-MASAC Approach
POMDP Formulation
Heterogeneous Actor-Critic Architecture
Structured Curriculum Learning Framework
EXPERIMENTS AND DISCUSSION
Simulation Experiment
Ablation Experiment
Conclusion
HYPERPARAMETER DETAILS
Neural Network Architectures
Training Hyperparameters
CURRICULUM STAGE SPECIFICATIONS
Environmental Configurations
...and 5 more sections

Figures (9)

Figure 1: Conceptual diagram of the multi-UAV cooperative path planning task. The swarm, composed of a Leader and multiple Followers, must navigate towards a target while avoiding obstacles and maintaining formation. Each UAV makes decentralized decisions based on its role-specific local observations.
Figure 2: AC-MASAC framework overview.Top: Curriculum learning module controlling task difficulty. Bottom: Attention-enhanced actor-critic networks processing states for action generation and training via a curriculum-managed replay buffer.
Figure 3: AC-MASAC architecture diagram.
Figure 4: Attention-based network architectures. (a) Leader Actor, (b) Follower Actor, and (c) Structured Attention Critic. Inputs are role-specific states; outputs are Gaussian policy parameters ($\mu, \sigma$).
Figure 5: Environment for different levels of training
...and 4 more figures

AC-MASAC: An Attentive Curriculum Learning Framework for Heterogeneous UAV Swarm Coordination

TL;DR

Abstract

AC-MASAC: An Attentive Curriculum Learning Framework for Heterogeneous UAV Swarm Coordination

Authors

TL;DR

Abstract

Table of Contents

Figures (9)