Table of Contents
Fetching ...

Diffusion-Reinforcement Learning Hierarchical Motion Planning in Multi-agent Adversarial Games

Zixuan Wu, Sean Ye, Manisha Natarajan, Matthew C. Gombolay

TL;DR

Problem: evasive motion planning in large, partially observable, multi-agent pursuit-evasion settings. Approach: a hierarchical diffusion-RL framework with a diffusion-based global planner guiding a low-level SAC evasion policy, plus a cost-map based path selection. Contributions: improved detection/goal-reaching metrics, interpretability and flexibility via the cost map, efficiency gains, and generalizability including real-robot demonstration. Significance: enables robust, scalable evasive navigation in realistic adversarial environments.

Abstract

Reinforcement Learning (RL)-based motion planning has recently shown the potential to outperform traditional approaches from autonomous navigation to robot manipulation. In this work, we focus on a motion planning task for an evasive target in a partially observable multi-agent adversarial pursuit-evasion game (PEG). Pursuit-evasion problems are relevant to various applications, such as search and rescue operations and surveillance robots, where robots must effectively plan their actions to gather intelligence or accomplish mission tasks while avoiding detection or capture. We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data, while a low-level RL policy reasons about evasive versus global path-following behavior. The benchmark results across different domains and different observability show that our approach outperforms baselines by 77.18% and 47.38% on detection and goal reaching rate, which leads to 51.4% increasing of the performance score on average. Additionally, our method improves interpretability, flexibility and efficiency of the learned policy.

Diffusion-Reinforcement Learning Hierarchical Motion Planning in Multi-agent Adversarial Games

TL;DR

Problem: evasive motion planning in large, partially observable, multi-agent pursuit-evasion settings. Approach: a hierarchical diffusion-RL framework with a diffusion-based global planner guiding a low-level SAC evasion policy, plus a cost-map based path selection. Contributions: improved detection/goal-reaching metrics, interpretability and flexibility via the cost map, efficiency gains, and generalizability including real-robot demonstration. Significance: enables robust, scalable evasive navigation in realistic adversarial environments.

Abstract

Reinforcement Learning (RL)-based motion planning has recently shown the potential to outperform traditional approaches from autonomous navigation to robot manipulation. In this work, we focus on a motion planning task for an evasive target in a partially observable multi-agent adversarial pursuit-evasion game (PEG). Pursuit-evasion problems are relevant to various applications, such as search and rescue operations and surveillance robots, where robots must effectively plan their actions to gather intelligence or accomplish mission tasks while avoiding detection or capture. We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data, while a low-level RL policy reasons about evasive versus global path-following behavior. The benchmark results across different domains and different observability show that our approach outperforms baselines by 77.18% and 47.38% on detection and goal reaching rate, which leads to 51.4% increasing of the performance score on average. Additionally, our method improves interpretability, flexibility and efficiency of the learned policy.
Paper Structure (21 sections, 2 equations, 7 figures, 2 tables, 4 algorithms)

This paper contains 21 sections, 2 equations, 7 figures, 2 tables, 4 algorithms.

Figures (7)

  • Figure 1: Diffusion-RL Framework Overview: We first collect RRT* paths into a dataset. Then we use a diffusion model to learn the distribution of the RRT* path and generate samples as the high-level global plans to help learn a low-level evasive RL policy. A posterior costmap is built based on the learned hierarchy and detection risk and can be used to select the best global path in the inference stage.
  • Figure 2: Prisoner Escape Domain (left) and Narco Traffic Interdiction Domain (right)
  • Figure 3: Our Diffusion-RL trajectory vs SAC trajectory. Colorbar indicates the distance between the evader and closest pursuer and the purple line indicates the diffusion global path.
  • Figure 4: Path Planning with the Costmap: The costmap (a) is constructed by correlating the agent's risk of detection to its location on the map. We show in (b) that the agent can successfully identify where the cameras are. Given this costmap, the agent can select a path that best evades high-cost regions (c). Additional obstacles can be added ad-hoc (d) and a new path can be chosen (e). The grey areas indicate untraversable obstacles or danger zones.
  • Figure 5: The paths from the diffusion model trained on RRT* are more diverse than A* paths (\ref{['fig:narrow']}-\ref{['fig:diverse']}). Compared to the traditional RRT* planner, the diffusion model leverages the power of parallel computing to generate trajectories an order of magnitude faster (\ref{['table:diffusion_time']}). The diffusion-guided RL training also significantly decreases the training time (\ref{['fig:training_time']}).
  • ...and 2 more figures