Table of Contents
Fetching ...

Robust Multi-Agent Reinforcement Learning for Small UAS Separation Assurance under GPS Degradation and Spoofing

Alex Zongo, Filippos Fotiadis, Ufuk Topcu, Peng Wei

Abstract

We address robust separation assurance for small Unmanned Aircraft Systems (sUAS) under GPS degradation and spoofing via Multi-Agent Reinforcement Learning (MARL). In cooperative surveillance, each aircraft (or agent) broadcasts its GPS-derived position; when such position broadcasts are corrupted, the entire observed air traffic state becomes unreliable. We cast this state observation corruption as a zero-sum game between the agents and an adversary: with probability R, the adversary perturbs the observed state to maximally degrade each agent's safety performance. We derive a closed-form expression for this adversarial perturbation, bypassing adversarial training entirely and enabling linear-time evaluation in the state dimension. We show that this expression approximates the true worst-case adversarial perturbation with second-order accuracy. We further bound the safety performance gap between clean and corrupted observations, showing that it degrades at most linearly with the corruption probability under Kullback-Leibler regularization. Finally, we integrate the closed-form adversarial policy into a MARL policy gradient algorithm to obtain a robust counter-policy for the agents. In a high-density sUAS simulation, we observe near-zero collision rates under corruption levels up to 35%, outperforming a baseline policy trained without adversarial perturbations.

Robust Multi-Agent Reinforcement Learning for Small UAS Separation Assurance under GPS Degradation and Spoofing

Abstract

We address robust separation assurance for small Unmanned Aircraft Systems (sUAS) under GPS degradation and spoofing via Multi-Agent Reinforcement Learning (MARL). In cooperative surveillance, each aircraft (or agent) broadcasts its GPS-derived position; when such position broadcasts are corrupted, the entire observed air traffic state becomes unreliable. We cast this state observation corruption as a zero-sum game between the agents and an adversary: with probability R, the adversary perturbs the observed state to maximally degrade each agent's safety performance. We derive a closed-form expression for this adversarial perturbation, bypassing adversarial training entirely and enabling linear-time evaluation in the state dimension. We show that this expression approximates the true worst-case adversarial perturbation with second-order accuracy. We further bound the safety performance gap between clean and corrupted observations, showing that it degrades at most linearly with the corruption probability under Kullback-Leibler regularization. Finally, we integrate the closed-form adversarial policy into a MARL policy gradient algorithm to obtain a robust counter-policy for the agents. In a high-density sUAS simulation, we observe near-zero collision rates under corruption levels up to 35%, outperforming a baseline policy trained without adversarial perturbations.

Paper Structure

This paper contains 16 sections, 5 theorems, 34 equations, 5 figures, 1 algorithm.

Key Result

Theorem 1

Let Assumption assump:local_diff hold, and let the uncertainty set be componentwise bounded as in eq:uncertainty_set. Then, a minimizer of the first-order (FO) adversarial problem eq:linearized_adversary_problem is

Figures (5)

  • Figure 2: An en-route airspace setting for sUAS package delivery. WP = WayPoint. The UASs flying in this airspace need to maintain safe separation when they are impacted by degraded or spoofed GPS signal. The network includes two routes that first cross at WP 1 (in red), then merge later at WP 7 (light red). Each route spans about $10$ km, illustrating a compact yet realistic urban scenario for sUAS operations.
  • Figure 3: Neural network architecture. Ownship and intruder states are independently encoded through fully connected (FC) layers with LeakyReLU activations. Intruder embeddings are aggregated via multi-head attention with pooling. The concatenated representation branches into policy and value heads.
  • Figure 4: Safety performance under increasing observation corruption. Left: Near mid-air collision (NMAC) count of the small UAS. Right: Minimum separation distance between the aircraft agents achieved per episode. The robust policy maintains near-zero NMACs through $R\approx0.35$ and degrades gracefully beyond, while the nominal policy deteriorates sharply after. Shaded regions indicate $\pm$ standard deviation.
  • Figure 5: Snapshot of high-density sUAS traffic under the robust policy at corruption rate $R = 0.35$. Aircraft (triangles) traverse structured routes through waypoints, with separation buffers shown as circles. Yellow indicates aircraft approaching loss of separation; green indicates nominal separation status. Labels display flight ID, speed (knots), and altitude (ft). Despite adversarial GPS perturbations, all aircraft maintain safe separation.
  • Figure 6: Ablation study isolating the contributions of invariance and anchoring regularization. Left: NMAC count. Right: Minimum separation distance. Invariance regularization without anchoring (blue) destabilizes training, yielding poor performance even at $R=0$. Anchoring alone (green) preserves nominal behavior but degrades sharply at high $R$. The full method (purple) combines both regularizers for consistent performance. Shaded regions indicate $\pm$ standard deviation.

Theorems & Definitions (6)

  • Definition 1
  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Proposition 1
  • Corollary 2