Table of Contents
Fetching ...

Multi-Agent Meta-Offline Reinforcement Learning for Timely UAV Path Planning and Data Collection

Eslam Eldeeb, Hirley Alves

TL;DR

This work addresses timely UAV path planning and data collection under dynamic network configurations by developing an offline multi-agent reinforcement learning framework that combines Conservative Q-Learning (CQL) with Model-Agnostic Meta-Learning (MAML). It introduces two variants, M-I-CQL and M-CTDE-CQL, leveraging offline data to learn initial policies that rapidly adapt to new objectives like AoI minimization and power reduction. The CTDE-based variant demonstrates faster and more stable convergence, achieving up to ~50% faster adaptation than baselines, and both variants consistently outperform non-MAML offline MARL methods. The approach offers a data-efficient, safe, and scalable solution for real-time UAV coordination in evolving wireless environments.

Abstract

Multi-agent reinforcement learning (MARL) has been widely adopted in high-performance computing and complex data-driven decision-making in the wireless domain. However, conventional MARL schemes face many obstacles in real-world scenarios. First, most MARL algorithms are online, which might be unsafe and impractical. Second, MARL algorithms are environment-specific, meaning network configuration changes require model retraining. This letter proposes a novel meta-offline MARL algorithm that combines conservative Q-learning (CQL) and model agnostic meta-learning (MAML). CQL enables offline training by leveraging pre-collected datasets, while MAML ensures scalability and adaptability to dynamic network configurations and objectives. We propose two algorithm variants: independent training (M-I-MARL) and centralized training decentralized execution (M-CTDE-MARL). Simulation results show that the proposed algorithm outperforms conventional schemes, especially the CTDE approach that achieves 50 % faster convergence in dynamic scenarios than the benchmarks. The proposed framework enhances scalability, robustness, and adaptability in wireless communication systems by optimizing UAV trajectories and scheduling policies.

Multi-Agent Meta-Offline Reinforcement Learning for Timely UAV Path Planning and Data Collection

TL;DR

This work addresses timely UAV path planning and data collection under dynamic network configurations by developing an offline multi-agent reinforcement learning framework that combines Conservative Q-Learning (CQL) with Model-Agnostic Meta-Learning (MAML). It introduces two variants, M-I-CQL and M-CTDE-CQL, leveraging offline data to learn initial policies that rapidly adapt to new objectives like AoI minimization and power reduction. The CTDE-based variant demonstrates faster and more stable convergence, achieving up to ~50% faster adaptation than baselines, and both variants consistently outperform non-MAML offline MARL methods. The approach offers a data-efficient, safe, and scalable solution for real-time UAV coordination in evolving wireless environments.

Abstract

Multi-agent reinforcement learning (MARL) has been widely adopted in high-performance computing and complex data-driven decision-making in the wireless domain. However, conventional MARL schemes face many obstacles in real-world scenarios. First, most MARL algorithms are online, which might be unsafe and impractical. Second, MARL algorithms are environment-specific, meaning network configuration changes require model retraining. This letter proposes a novel meta-offline MARL algorithm that combines conservative Q-learning (CQL) and model agnostic meta-learning (MAML). CQL enables offline training by leveraging pre-collected datasets, while MAML ensures scalability and adaptability to dynamic network configurations and objectives. We propose two algorithm variants: independent training (M-I-MARL) and centralized training decentralized execution (M-CTDE-MARL). Simulation results show that the proposed algorithm outperforms conventional schemes, especially the CTDE approach that achieves 50 % faster convergence in dynamic scenarios than the benchmarks. The proposed framework enhances scalability, robustness, and adaptability in wireless communication systems by optimizing UAV trajectories and scheduling policies.

Paper Structure

This paper contains 7 sections, 11 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: Illustration of the proposed CQL-MAML algorithm, comprising meta-RL training and testing phases. The former utilizes offline training, using the CQL algorithm, across different tasks (environments) with different objectives to find the optimum initial parameters. In contrast, the latter performs a few offline SGD steps over the weights reached by a new unseen task.
  • Figure 2: Convergence performance of the proposed algorithm compared to the benchmarks: (a) independent training case and (b) CTDE training case.
  • Figure 3: The effect of model parameters: (a) dataset size effect, (b) training tasks effect, and (c) achievable AoI-power for different objectives.