Table of Contents
Fetching ...

CoRL-MPPI: Enhancing MPPI With Learnable Behaviours For Efficient And Provably-Safe Multi-Robot Collision Avoidance

Stepan Dergachev, Artem Pshenitsyn, Aleksandr Panov, Alexey Skrynnik, Konstantin Yakovlev

TL;DR

CoRL-MPPI addresses decentralized collision avoidance for dense multi-robot systems by fusing a pretrained cooperative RL policy with the MPPI controller. The learned policy biases MPPI's sampling toward cooperative, collision-avoiding actions while preserving MPPI's theoretical safety guarantees. The approach introduces a safety-constrained, two-branch planning scheme and demonstrates superior performance over ORCA, BVC, and a multi-agent MPPI baseline in dense simulations, with high success rates and shorter makespans. These results suggest strong potential for real-world swarms and motivate future work on sim-to-real transfer and online policy adaptation.

Abstract

Decentralized collision avoidance remains a core challenge for scalable multi-robot systems. One of the promising approaches to tackle this problem is Model Predictive Path Integral (MPPI) -- a framework that is naturally suited to handle any robot motion model and provides strong theoretical guarantees. Still, in practice MPPI-based controller may provide suboptimal trajectories as its performance relies heavily on uninformed random sampling. In this work, we introduce CoRL-MPPI, a novel fusion of Cooperative Reinforcement Learning and MPPI to address this limitation. We train an action policy (approximated as deep neural network) in simulation that learns local cooperative collision avoidance behaviors. This learned policy is then embedded into the MPPI framework to guide its sampling distribution, biasing it towards more intelligent and cooperative actions. Notably, CoRL-MPPI preserves all the theoretical guarantees of regular MPPI. We evaluate our approach in dense, dynamic simulation environments against state-of-the-art baselines, including ORCA, BVC, and a multi-agent MPPI implementation. Our results demonstrate that CoRL-MPPI significantly improves navigation efficiency (measured by success rate and makespan) and safety, enabling agile and robust multi-robot navigation.

CoRL-MPPI: Enhancing MPPI With Learnable Behaviours For Efficient And Provably-Safe Multi-Robot Collision Avoidance

TL;DR

CoRL-MPPI addresses decentralized collision avoidance for dense multi-robot systems by fusing a pretrained cooperative RL policy with the MPPI controller. The learned policy biases MPPI's sampling toward cooperative, collision-avoiding actions while preserving MPPI's theoretical safety guarantees. The approach introduces a safety-constrained, two-branch planning scheme and demonstrates superior performance over ORCA, BVC, and a multi-agent MPPI baseline in dense simulations, with high success rates and shorter makespans. These results suggest strong potential for real-world swarms and motivate future work on sim-to-real transfer and online policy adaptation.

Abstract

Decentralized collision avoidance remains a core challenge for scalable multi-robot systems. One of the promising approaches to tackle this problem is Model Predictive Path Integral (MPPI) -- a framework that is naturally suited to handle any robot motion model and provides strong theoretical guarantees. Still, in practice MPPI-based controller may provide suboptimal trajectories as its performance relies heavily on uninformed random sampling. In this work, we introduce CoRL-MPPI, a novel fusion of Cooperative Reinforcement Learning and MPPI to address this limitation. We train an action policy (approximated as deep neural network) in simulation that learns local cooperative collision avoidance behaviors. This learned policy is then embedded into the MPPI framework to guide its sampling distribution, biasing it towards more intelligent and cooperative actions. Notably, CoRL-MPPI preserves all the theoretical guarantees of regular MPPI. We evaluate our approach in dense, dynamic simulation environments against state-of-the-art baselines, including ORCA, BVC, and a multi-agent MPPI implementation. Our results demonstrate that CoRL-MPPI significantly improves navigation efficiency (measured by success rate and makespan) and safety, enabling agile and robust multi-robot navigation.

Paper Structure

This paper contains 33 sections, 17 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: The figure illustrates the core idea of our method for decentralized collision avoidance. The left panel shows the baseline MPPI controller, where random rollouts (yellow) lead to potential collisions (red crosses) and suboptimal control (red trajectory). The right panel depicts proposed fusion of RL and MPPI, where learned policy rollouts (blue) bias the sampling distribution toward more cooperative and collision-free behaviors, improving final control performance while retaining the theoretical guarantees of MPPI.
  • Figure 2: Visualization of safety-constrained update of distribution parameters. The probability mass of unsafe controls (red region) is reduced to meet the required confidence level.
  • Figure 3: Illustrative visualization of the experimental scenarios. Scales and proportions are adjusted for clarity
  • Figure 4: The average makespan of the evaluated algorithms across the Random, Circle, and Mesh (Dense) scenarios. Only instances with a 100% successful runs are included. The lower is better