Table of Contents
Fetching ...

Path Planning through Multi-Agent Reinforcement Learning in Dynamic Environments

Jonas De Maeyer, Hossein Yarahmadi, Moharram Challenger

TL;DR

This work tackles dynamic path planning by introducing a region-aware, hierarchical reinforcement learning framework that eliminates reliance on a global planner. It deploys leaf-level Q-learning within a tree-structured decomposition of the environment and augments this with a retraining condition based on sub-environment success rates, plus federated Q-learning to accelerate learning. The key contributions include the hierarchical decomposition, a region-aware retraining mechanism, and two federated learning variants (EqAvg and ImAvg) that outperform single-agent baselines and approach A* Oracle in accuracy while offering better adaptation times and scalability. Experiments across easy, medium, and hard mazes, as well as a rigorous edge-case, demonstrate that federated approaches achieve robust performance with lower initial training and cumulative adaptation costs, supporting scalable deployment in dynamic, real-world settings. The study also outlines limitations and future work, including deep RL extensions, improved path optimality, and real-world robotic integration.

Abstract

Path planning in dynamic environments is a fundamental challenge in intelligent transportation and robotics, where obstacles and conditions change over time, introducing uncertainty and requiring continuous adaptation. While existing approaches often assume complete environmental unpredictability or rely on global planners, these assumptions limit scalability and practical deployment in real-world settings. In this paper, we propose a scalable, region-aware reinforcement learning (RL) framework for path planning in dynamic environments. Our method builds on the observation that environmental changes, although dynamic, are often localized within bounded regions. To exploit this, we introduce a hierarchical decomposition of the environment and deploy distributed RL agents that adapt to changes locally. We further propose a retraining mechanism based on sub-environment success rates to determine when policy updates are necessary. Two training paradigms are explored: single-agent Q-learning and multi-agent federated Q-learning, where local Q-tables are aggregated periodically to accelerate the learning process. Unlike prior work, we evaluate our methods in more realistic settings, where multiple simultaneous obstacle changes and increasing difficulty levels are present. Results show that the federated variants consistently outperform their single-agent counterparts and closely approach the performance of A* Oracle while maintaining shorter adaptation times and robust scalability. Although initial training remains time-consuming in large environments, our decentralized framework eliminates the need for a global planner and lays the groundwork for future improvements using deep RL and flexible environment decomposition.

Path Planning through Multi-Agent Reinforcement Learning in Dynamic Environments

TL;DR

This work tackles dynamic path planning by introducing a region-aware, hierarchical reinforcement learning framework that eliminates reliance on a global planner. It deploys leaf-level Q-learning within a tree-structured decomposition of the environment and augments this with a retraining condition based on sub-environment success rates, plus federated Q-learning to accelerate learning. The key contributions include the hierarchical decomposition, a region-aware retraining mechanism, and two federated learning variants (EqAvg and ImAvg) that outperform single-agent baselines and approach A* Oracle in accuracy while offering better adaptation times and scalability. Experiments across easy, medium, and hard mazes, as well as a rigorous edge-case, demonstrate that federated approaches achieve robust performance with lower initial training and cumulative adaptation costs, supporting scalable deployment in dynamic, real-world settings. The study also outlines limitations and future work, including deep RL extensions, improved path optimality, and real-world robotic integration.

Abstract

Path planning in dynamic environments is a fundamental challenge in intelligent transportation and robotics, where obstacles and conditions change over time, introducing uncertainty and requiring continuous adaptation. While existing approaches often assume complete environmental unpredictability or rely on global planners, these assumptions limit scalability and practical deployment in real-world settings. In this paper, we propose a scalable, region-aware reinforcement learning (RL) framework for path planning in dynamic environments. Our method builds on the observation that environmental changes, although dynamic, are often localized within bounded regions. To exploit this, we introduce a hierarchical decomposition of the environment and deploy distributed RL agents that adapt to changes locally. We further propose a retraining mechanism based on sub-environment success rates to determine when policy updates are necessary. Two training paradigms are explored: single-agent Q-learning and multi-agent federated Q-learning, where local Q-tables are aggregated periodically to accelerate the learning process. Unlike prior work, we evaluate our methods in more realistic settings, where multiple simultaneous obstacle changes and increasing difficulty levels are present. Results show that the federated variants consistently outperform their single-agent counterparts and closely approach the performance of A* Oracle while maintaining shorter adaptation times and robust scalability. Although initial training remains time-consuming in large environments, our decentralized framework eliminates the need for a global planner and lays the groundwork for future improvements using deep RL and flexible environment decomposition.

Paper Structure

This paper contains 111 sections, 16 equations, 34 figures, 2 algorithms.

Figures (34)

  • Figure 1: Agent-environment interaction loop: the agent observes state $S_t$, performs action $A_t$, receives reward $R_{t+1}$ and transitions to state $S_{t+1}$.
  • Figure 2: a $9 \times 9$ maze consisting of free space positions (white), static obstacles (black), dynamic obstacles (red), charging stations (yellow suns), and a robot.
  • Figure 3: a $9 \times 9$ maze divided into $3 \times 3$ sub-mazes. Each sub-maze is assigned to a dedicated processing agent.
  • Figure 4: Hierarchical decomposition until sub-environments of size $4 \times 4$ are reached.
  • Figure 5: Tree representation of the hierarchical decomposition framework.
  • ...and 29 more figures