Learning Coordinated Maneuver in Adversarial Environments
Zechen Hu, Manshi Limbu, Daigo Shishika, Xuesu Xiao, Xuan Wang
TL;DR
The paper addresses coordinating a team of robots traversing a route in the presence of adversaries, aiming to minimize a combined cost of risk exposure and mission time. It first analyzes a single-adversary case to extract actionable insights, then scales to multiple robots and adversaries using a centralized MDP solved by PPO-based reinforcement learning with a novel multi-weighted hot state encoding and reward reshaping. The proposed H-PPO approach yields robust, generalizable coordination patterns (e.g., bounding overwatch) and outperforms baselines and practical solvers in simulated environments, while highlighting scalability and real-time applicability. The work lays groundwork for decentralized extensions and geometry-aware planning in more complex, uncertain adversarial settings.
Abstract
This paper aims to solve the coordination of a team of robots traversing a route in the presence of adversaries with random positions. Our goal is to minimize the overall cost of the team, which is determined by (i) the accumulated risk when robots stay in adversary-impacted zones and (ii) the mission completion time. During traversal, robots can reduce their speed and act as a `guard' (the slower, the better), which will decrease the risks certain adversary incurs. This leads to a trade-off between the robots' guarding behaviors and their travel speeds. The formulated problem is highly non-convex and cannot be efficiently solved by existing algorithms. Our approach includes a theoretical analysis of the robots' behaviors for the single-adversary case. As the scale of the problem expands, solving the optimal solution using optimization approaches is challenging, therefore, we employ reinforcement learning techniques by developing new encoding and policy-generating methods. Simulations demonstrate that our learning methods can efficiently produce team coordination behaviors. We discuss the reasoning behind these behaviors and explain why they reduce the overall team cost.
