JCAS-MARL: Joint Communication and Sensing UAV Networks via Resource-Constrained Multi-Agent Reinforcement Learning

Islam Guven; Mehmet Parlak

JCAS-MARL: Joint Communication and Sensing UAV Networks via Resource-Constrained Multi-Agent Reinforcement Learning

Islam Guven, Mehmet Parlak

Abstract

Multi-UAV networks are increasingly deployed for large-scale inspection and monitoring missions, where operational performance depends on the coordination of sensing reliability, communication quality, and energy constraints. In particular, the rapid increase in overflowing waste bins and illegal dumping sites has created a need for efficient detection of waste hotspots. In this work, we introduce JCAS-MARL, a resource-aware multi-agent reinforcement learning (MARL) framework for joint communication and sensing (JCAS)-enabled UAV networks. Within this framework, multiple UAVs operate in a shared environment where each agent jointly controls its trajectory and the resource allocation of an OFDM waveform used simultaneously for sensing and communication. Battery consumption, charging behavior, and associated CO$_2$ emissions are incorporated into the system state to model realistic operational constraints. Information sharing occurs over a dynamic communication graph determined by UAV positions and wireless channel conditions. Waste hotspot detection requires consensus among multiple UAVs to improve reliability. Using this environment, we investigate how MARL policies exploit the sensing-communication-energy trade-off in JCAS-enabled UAV networks. Simulation results demonstrate that adaptive pilot-density control learned by the agents can outperform static configurations, particularly in scenarios where sensing accuracy and communication connectivity vary across the environment.

JCAS-MARL: Joint Communication and Sensing UAV Networks via Resource-Constrained Multi-Agent Reinforcement Learning

Abstract

emissions are incorporated into the system state to model realistic operational constraints. Information sharing occurs over a dynamic communication graph determined by UAV positions and wireless channel conditions. Waste hotspot detection requires consensus among multiple UAVs to improve reliability. Using this environment, we investigate how MARL policies exploit the sensing-communication-energy trade-off in JCAS-enabled UAV networks. Simulation results demonstrate that adaptive pilot-density control learned by the agents can outperform static configurations, particularly in scenarios where sensing accuracy and communication connectivity vary across the environment.

Paper Structure (19 sections, 13 equations, 8 figures, 3 tables)

This paper contains 19 sections, 13 equations, 8 figures, 3 tables.

Introduction
System Model
JCAS Sensing and Communication Model
Energy and Carbon Model
Task Completion
MARL Formulation
Action Space
Reward Design with JCAS and Sustainability Signals
Observations
Knowledge Propagation
Training Algorithm
Results
Environment setup
Training Costs
PPO Convergence Across Target Densities
...and 4 more sections

Figures (8)

Figure 1: Illustration of the UAV waste-hotspot localization mission, in which a team of UAVs is deployed from a central depot that acts as a base station and patrols the designated grid area.
Figure 2: Training evolution of the mean episode return for different numbers of targets.
Figure 3: Evaluation success rate as a function of the number of UAVs for $3-11$ targets.
Figure 4: Evaluation mission time as a function of the number of UAVs.
Figure 5: Total energy consumption per mission as a function of the number of UAVs.
...and 3 more figures

JCAS-MARL: Joint Communication and Sensing UAV Networks via Resource-Constrained Multi-Agent Reinforcement Learning

Abstract

JCAS-MARL: Joint Communication and Sensing UAV Networks via Resource-Constrained Multi-Agent Reinforcement Learning

Authors

Abstract

Table of Contents

Figures (8)