Table of Contents
Fetching ...

Learning Coordinated Maneuver in Adversarial Environments

Zechen Hu, Manshi Limbu, Daigo Shishika, Xuesu Xiao, Xuan Wang

TL;DR

The paper addresses coordinating a team of robots traversing a route in the presence of adversaries, aiming to minimize a combined cost of risk exposure and mission time. It first analyzes a single-adversary case to extract actionable insights, then scales to multiple robots and adversaries using a centralized MDP solved by PPO-based reinforcement learning with a novel multi-weighted hot state encoding and reward reshaping. The proposed H-PPO approach yields robust, generalizable coordination patterns (e.g., bounding overwatch) and outperforms baselines and practical solvers in simulated environments, while highlighting scalability and real-time applicability. The work lays groundwork for decentralized extensions and geometry-aware planning in more complex, uncertain adversarial settings.

Abstract

This paper aims to solve the coordination of a team of robots traversing a route in the presence of adversaries with random positions. Our goal is to minimize the overall cost of the team, which is determined by (i) the accumulated risk when robots stay in adversary-impacted zones and (ii) the mission completion time. During traversal, robots can reduce their speed and act as a `guard' (the slower, the better), which will decrease the risks certain adversary incurs. This leads to a trade-off between the robots' guarding behaviors and their travel speeds. The formulated problem is highly non-convex and cannot be efficiently solved by existing algorithms. Our approach includes a theoretical analysis of the robots' behaviors for the single-adversary case. As the scale of the problem expands, solving the optimal solution using optimization approaches is challenging, therefore, we employ reinforcement learning techniques by developing new encoding and policy-generating methods. Simulations demonstrate that our learning methods can efficiently produce team coordination behaviors. We discuss the reasoning behind these behaviors and explain why they reduce the overall team cost.

Learning Coordinated Maneuver in Adversarial Environments

TL;DR

The paper addresses coordinating a team of robots traversing a route in the presence of adversaries, aiming to minimize a combined cost of risk exposure and mission time. It first analyzes a single-adversary case to extract actionable insights, then scales to multiple robots and adversaries using a centralized MDP solved by PPO-based reinforcement learning with a novel multi-weighted hot state encoding and reward reshaping. The proposed H-PPO approach yields robust, generalizable coordination patterns (e.g., bounding overwatch) and outperforms baselines and practical solvers in simulated environments, while highlighting scalability and real-time applicability. The work lays groundwork for decentralized extensions and geometry-aware planning in more complex, uncertain adversarial settings.

Abstract

This paper aims to solve the coordination of a team of robots traversing a route in the presence of adversaries with random positions. Our goal is to minimize the overall cost of the team, which is determined by (i) the accumulated risk when robots stay in adversary-impacted zones and (ii) the mission completion time. During traversal, robots can reduce their speed and act as a `guard' (the slower, the better), which will decrease the risks certain adversary incurs. This leads to a trade-off between the robots' guarding behaviors and their travel speeds. The formulated problem is highly non-convex and cannot be efficiently solved by existing algorithms. Our approach includes a theoretical analysis of the robots' behaviors for the single-adversary case. As the scale of the problem expands, solving the optimal solution using optimization approaches is challenging, therefore, we employ reinforcement learning techniques by developing new encoding and policy-generating methods. Simulations demonstrate that our learning methods can efficiently produce team coordination behaviors. We discuss the reasoning behind these behaviors and explain why they reduce the overall team cost.
Paper Structure (12 sections, 9 equations, 6 figures, 1 table)

This paper contains 12 sections, 9 equations, 6 figures, 1 table.

Figures (6)

  • Figure 2: RL Implementation: H-PPO with multi-weighted hot encoding and reward reshaping
  • Figure 3: Experiment environments with different adversary configurations. The height represents the unit risk each adversary generates at different locations.
  • Figure 4: Using D-PPO and H-PPO to solve team coordination problem with first two environments in Fig. \ref{['fig_env']}. The $x$-axis represents time and the $y$-axis represents the length the robot traveled in the environment. The slope represents the robot's speed. The color dots on the trajectories represent the adversary the robot is currently guarding against. In (c) the purple dash represents the current adversary position. The shades are the adversary-impacted zones with darker colors in the middle to represent higher risk, corresponding to Fig. \ref{['fig_env']}.
  • Figure 5: Validations of Reward Reshaping and Weighted-hot State Encoding Techniques.
  • Figure 6: Coordination of three robots in M3 using H-PPO. The figure is read the same way as Fig. \ref{['fig_M1M2a']}.
  • ...and 1 more figures

Theorems & Definitions (1)

  • proof