Table of Contents
Fetching ...

Multi-Agent Meta-Advisor for UAV Fleet Trajectory Design in Vehicular Networks

Leonardo Spampinato, Lorenzo Mario Amorosa, Enrico Testi, Chiara Buratti, Riccardo Marini

TL;DR

This paper addresses cooperative multi-agent trajectory design under different service areas and takeoff configurations, where rapid and safe adaptation across scenarios is essential, and proposes the multi-agent meta-advisor with advisor override (MAMO).

Abstract

Future vehicular networks require continuous connectivity to serve highly mobile users in urban environments. To mitigate the coverage limitations of fixed terrestrial macro base stations (MBS) under non line-of-sight (NLoS) conditions, fleets of unmanned aerial base stations (UABSs) can be deployed as aerial base stations, dynamically repositioning to track vehicular users and traffic hotspots in coordination with the terrestrial network. This paper addresses cooperative multi-agent trajectory design under different service areas and takeoff configurations, where rapid and safe adaptation across scenarios is essential. We formulate the problem as a multi-task decentralized partially observable Markov decision process and solve it using centralized training and decentralized execution with double dueling deep Q-network (3DQN), enabling online training for real-world deployments. However, efficient exploration remains a bottleneck, with conventional strategies like $ε$-greedy requiring careful tuning. To overcome this, we propose the multi-agent meta-advisor with advisor override (MAMO). This framework guides agent exploration through a meta-policy learned jointly across tasks. It uses a dynamic override mechanism that allows agents to reject misaligned guidance when the advisor fails to generalize to a specific scenario. Simulation results across three realistic urban scenarios and multiple takeoff configurations show that MAMO achieves faster convergence and higher returns than tuned $ε$-greedy baselines, outperforming both an advisor-only ablation and a single generalized policy. Finally, we demonstrate that the learned UABS fleet significantly improves network performance compared to deployments without aerial support.

Multi-Agent Meta-Advisor for UAV Fleet Trajectory Design in Vehicular Networks

TL;DR

This paper addresses cooperative multi-agent trajectory design under different service areas and takeoff configurations, where rapid and safe adaptation across scenarios is essential, and proposes the multi-agent meta-advisor with advisor override (MAMO).

Abstract

Future vehicular networks require continuous connectivity to serve highly mobile users in urban environments. To mitigate the coverage limitations of fixed terrestrial macro base stations (MBS) under non line-of-sight (NLoS) conditions, fleets of unmanned aerial base stations (UABSs) can be deployed as aerial base stations, dynamically repositioning to track vehicular users and traffic hotspots in coordination with the terrestrial network. This paper addresses cooperative multi-agent trajectory design under different service areas and takeoff configurations, where rapid and safe adaptation across scenarios is essential. We formulate the problem as a multi-task decentralized partially observable Markov decision process and solve it using centralized training and decentralized execution with double dueling deep Q-network (3DQN), enabling online training for real-world deployments. However, efficient exploration remains a bottleneck, with conventional strategies like -greedy requiring careful tuning. To overcome this, we propose the multi-agent meta-advisor with advisor override (MAMO). This framework guides agent exploration through a meta-policy learned jointly across tasks. It uses a dynamic override mechanism that allows agents to reject misaligned guidance when the advisor fails to generalize to a specific scenario. Simulation results across three realistic urban scenarios and multiple takeoff configurations show that MAMO achieves faster convergence and higher returns than tuned -greedy baselines, outperforming both an advisor-only ablation and a single generalized policy. Finally, we demonstrate that the learned UABS fleet significantly improves network performance compared to deployments without aerial support.
Paper Structure (19 sections, 34 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 34 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: Representation of the reference scenario.
  • Figure 2: CTDE architecture for UABS-aided vehicular networks. Each UABS $1, \ldots, U$ has a shared local copy of the trained model, which is updated periodically by the controller entity from experiences gathered by all the autonomous agents.
  • Figure 3: Exploration strategies for multi task training.
  • Figure 4: Service area considered, with red triangles representing the MBS position.
  • Figure 5: Average return trends during training, averaged over tasks belonging to the same service area.
  • ...and 4 more figures