Design and Evaluation of a Multi-Agent Perception System for Autonomous Flying Networks
Diogo Ferreira, Pedro Ribeiro, André Coelho, Rui Campos
TL;DR
MAPS addresses the gap of autonomous perception in Flying Networks by fusing visual and audio inputs through MM-LLMs and a three-agent Brain to produce structured SLSs for zero-touch network control. It demonstrates near real-time performance and reasonable accuracy on a synthetic emergency dataset, while revealing latency bottlenecks dominated by LLM API interactions. The work also contributes a reproducible multimodal synthetic dataset and analyzes practical deployment considerations, including edge computing to reduce latency. Overall, MAPS advances autonomous sensing and decision-making for responsive, infrastructure-light FN operations.
Abstract
Autonomous Flying Networks (FNs) are emerging as a key enabler of on-demand connectivity in dynamic and infrastructure-limited environments. However, current approaches mainly focus on UAV placement, routing, and resource management, neglecting the autonomous perception of users and their service demands - a critical capability for zero-touch network operation. This paper presents the Multi-Agent Perception System (MAPS), a modular and scalable system that leverages multi-modal large language models (MM-LLMs) and agentic Artificial Intelligence (AI) to interpret visual and audio data collected by UAVs and generate Service Level Specifications (SLSs) describing user count, spatial distribution, and traffic demand. MAPS is evaluated using a synthetic multimodal emergency dataset, achieving user detection accuracies above 70% and SLS generation under 130 seconds in 90% of cases. Results demonstrate that combining audio and visual modalities enhances user detection and show that MAPS provides the perception layer required for autonomous, zero-touch FNs.
