Cat-and-Mouse Satellite Dynamics: Divergent Adversarial Reinforcement Learning for Contested Multi-Agent Space Operations

Cameron Mehlman; Joseph Abramov; Gregory Falco

Cat-and-Mouse Satellite Dynamics: Divergent Adversarial Reinforcement Learning for Contested Multi-Agent Space Operations

Cameron Mehlman, Joseph Abramov, Gregory Falco

TL;DR

This paper addresses robust autonomous evasion in contested space where adversaries actively pursue a satellite. It introduces Divergent Adversarial Reinforcement Learning (DARL), a two-stage MARL framework that first learns a base evader and then trains multiple divergent adversaries to maximize exploration, followed by refining the evader against these adversaries; the approach uses a Divergent loss $igackslash\mathcal{L}_{KL}$ to promote diverse adversary policies and Soft Actor-Critic with a low target entropy. Validation is conducted in a 3 degrees-of-freedom, partially observable cat-and-mouse scenario modeled as capture-the-flag, with Clohessy–Wiltshire dynamics, $m imes m$ voxelized observations, and a $3$m/1m safety threshold. Results show DARL outperforms EVADE, BE, SA, MA, and NSA across success rates and robustness, highlighting the value of training against diverse adversaries for generalizing to unseen tactics and contested environments; and the work outlines pathways toward sim-to-real deployment on space or UAV platforms with domain randomization and safety considerations. The findings have practical implications for autonomous satellite operations in crowded or hostile orbital regimes, enabling more reliable and adaptable evasive behavior under adversarial pressure.

Abstract

As space becomes increasingly crowded and contested, robust autonomous capabilities for multi-agent environments are gaining critical importance. Current autonomous systems in space primarily rely on optimization-based path planning or long-range orbital maneuvers, which have not yet proven effective in adversarial scenarios where one satellite is actively pursuing another. We introduce Divergent Adversarial Reinforcement Learning (DARL), a two-stage Multi-Agent Reinforcement Learning (MARL) approach designed to train autonomous evasion strategies for satellites engaged with multiple adversarial spacecraft. Our method enhances exploration during training by promoting diverse adversarial strategies, leading to more robust and adaptable evader models. We validate DARL through a cat-and-mouse satellite scenario, modeled as a partially observable multi-agent capture the flag game where two adversarial `cat' spacecraft pursue a single `mouse' evader. DARL's performance is compared against several benchmarks, including an optimization-based satellite path planner, demonstrating its ability to produce highly robust models for adversarial multi-agent space environments.

Cat-and-Mouse Satellite Dynamics: Divergent Adversarial Reinforcement Learning for Contested Multi-Agent Space Operations

TL;DR

to promote diverse adversary policies and Soft Actor-Critic with a low target entropy. Validation is conducted in a 3 degrees-of-freedom, partially observable cat-and-mouse scenario modeled as capture-the-flag, with Clohessy–Wiltshire dynamics,

voxelized observations, and a

m/1m safety threshold. Results show DARL outperforms EVADE, BE, SA, MA, and NSA across success rates and robustness, highlighting the value of training against diverse adversaries for generalizing to unseen tactics and contested environments; and the work outlines pathways toward sim-to-real deployment on space or UAV platforms with domain randomization and safety considerations. The findings have practical implications for autonomous satellite operations in crowded or hostile orbital regimes, enabling more reliable and adaptable evasive behavior under adversarial pressure.

Abstract

Paper Structure (20 sections, 7 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 20 sections, 7 equations, 5 figures, 1 table, 2 algorithms.

INTRODUCTION
RELATED WORKS
METHOD
Evader Problem Definition
Adversary Problem Definition
Training Scheme
Stage I: Learning a Base Evader Policy
Stage II: Learning from Divergent Adversaries
EXPERIMENT AND RESULTS
Experimental Setup
Evaluation Benchmarks
EVADE
Base Evader Policy (BE)
Single Adversary (SA)
Base Multi-Adversary (MA)
...and 5 more sections

Figures (5)

Figure 1: A depiction of the training scheme we propose. The evader policy $\boldsymbol{\pi}_e$ is initially trained in a static-obstacle avoidance environment. The resulting policy is then retrained in a MARL environment with multiple adversarial policies that are encouraged to produce dissimilar behaviors through a divergent loss term.
Figure 2: A description of how the voxelized state space (left) is represented as a flattened matrix $\boldsymbol{H}_f$ (right).
Figure 3: An image of the obstacle evasion environment used to train the base evader policy, where the evader (white), must reach $g_e$, without colliding with any of the obstacles (red).
Figure 4: Base evader training curve (left), and DARL, MA, and BA Evader training curves (right). The dotted red line marks the beginning of the evader policy's network updates.
Figure 5: Validation curve comparing the average test performance of the DARL, MA, and SA models trained at different timesteps.

Cat-and-Mouse Satellite Dynamics: Divergent Adversarial Reinforcement Learning for Contested Multi-Agent Space Operations

TL;DR

Abstract

Cat-and-Mouse Satellite Dynamics: Divergent Adversarial Reinforcement Learning for Contested Multi-Agent Space Operations

Authors

TL;DR

Abstract

Table of Contents

Figures (5)