Multi-Agent Meta-Advisor for UAV Fleet Trajectory Design in Vehicular Networks

Leonardo Spampinato; Lorenzo Mario Amorosa; Enrico Testi; Chiara Buratti; Riccardo Marini

Multi-Agent Meta-Advisor for UAV Fleet Trajectory Design in Vehicular Networks

Leonardo Spampinato, Lorenzo Mario Amorosa, Enrico Testi, Chiara Buratti, Riccardo Marini

TL;DR

This paper addresses cooperative multi-agent trajectory design under different service areas and takeoff configurations, where rapid and safe adaptation across scenarios is essential, and proposes the multi-agent meta-advisor with advisor override (MAMO).

Abstract

Future vehicular networks require continuous connectivity to serve highly mobile users in urban environments. To mitigate the coverage limitations of fixed terrestrial macro base stations (MBS) under non line-of-sight (NLoS) conditions, fleets of unmanned aerial base stations (UABSs) can be deployed as aerial base stations, dynamically repositioning to track vehicular users and traffic hotspots in coordination with the terrestrial network. This paper addresses cooperative multi-agent trajectory design under different service areas and takeoff configurations, where rapid and safe adaptation across scenarios is essential. We formulate the problem as a multi-task decentralized partially observable Markov decision process and solve it using centralized training and decentralized execution with double dueling deep Q-network (3DQN), enabling online training for real-world deployments. However, efficient exploration remains a bottleneck, with conventional strategies like $ε$-greedy requiring careful tuning. To overcome this, we propose the multi-agent meta-advisor with advisor override (MAMO). This framework guides agent exploration through a meta-policy learned jointly across tasks. It uses a dynamic override mechanism that allows agents to reject misaligned guidance when the advisor fails to generalize to a specific scenario. Simulation results across three realistic urban scenarios and multiple takeoff configurations show that MAMO achieves faster convergence and higher returns than tuned $ε$-greedy baselines, outperforming both an advisor-only ablation and a single generalized policy. Finally, we demonstrate that the learned UABS fleet significantly improves network performance compared to deployments without aerial support.

Multi-Agent Meta-Advisor for UAV Fleet Trajectory Design in Vehicular Networks

TL;DR

Abstract

-greedy requiring careful tuning. To overcome this, we propose the multi-agent meta-advisor with advisor override (MAMO). This framework guides agent exploration through a meta-policy learned jointly across tasks. It uses a dynamic override mechanism that allows agents to reject misaligned guidance when the advisor fails to generalize to a specific scenario. Simulation results across three realistic urban scenarios and multiple takeoff configurations show that MAMO achieves faster convergence and higher returns than tuned

-greedy baselines, outperforming both an advisor-only ablation and a single generalized policy. Finally, we demonstrate that the learned UABS fleet significantly improves network performance compared to deployments without aerial support.

Paper Structure (19 sections, 34 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 34 equations, 9 figures, 3 tables, 1 algorithm.

Introduction
Related Work
System Model
Reference Scenario
Reference Application
Channel Model
Problem Formulation
Decentralized Partially Observable MDP
Multi-Agent Deep Reinforcement Learning System
Exploration Policies
epsilon-greedy
Multi-Agent Meta-Advisor
Advisor Override
Numerical Results
Simulation Settings
...and 4 more sections

Figures (9)

Figure 1: Representation of the reference scenario.
Figure 2: CTDE architecture for UABS-aided vehicular networks. Each UABS $1, \ldots, U$ has a shared local copy of the trained model, which is updated periodically by the controller entity from experiences gathered by all the autonomous agents.
Figure 3: Exploration strategies for multi task training.
Figure 4: Service area considered, with red triangles representing the MBS position.
Figure 5: Average return trends during training, averaged over tasks belonging to the same service area.
...and 4 more figures

Multi-Agent Meta-Advisor for UAV Fleet Trajectory Design in Vehicular Networks

TL;DR

Abstract

Multi-Agent Meta-Advisor for UAV Fleet Trajectory Design in Vehicular Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (9)