Table of Contents
Fetching ...

Planning the path with Reinforcement Learning: Optimal Robot Motion Planning in RoboCup Small Size League Environments

Mateus G. Machado, João G. Melo, Cleber Zanchettin, Pedro H. M. Braga, Pedro V. Cunha, Edna N. S. Barros, Hansenclever F. Bassani

TL;DR

Problem: RL for robot motion planning in dynamic RoboCup SSL environments. Approach: a model-free SAC-based path-planning framework evaluated across Baseline, Proposed, and Obstacle RL environments, with Frame Skip and CAPS to stabilize actions. Contributions: a simplified Proposed environment, a mobile-obstacle variant, and a FSCAPS stabilization strategy, plus real-world validation. Findings: obstacle-free experiments show substantial gains including roughly $60\%$ time savings; obstacle tests show robust avoidance and low collision rates, and real-world trials demonstrate transferability. Significance: the results support practical, scalable RL-based path planning for fast, multi-robot SSL settings and guide Sim2Real deployment strategies.

Abstract

This work investigates the potential of Reinforcement Learning (RL) to tackle robot motion planning challenges in the dynamic RoboCup Small Size League (SSL). Using a heuristic control approach, we evaluate RL's effectiveness in obstacle-free and single-obstacle path-planning environments. Ablation studies reveal significant performance improvements. Our method achieved a 60% time gain in obstacle-free environments compared to baseline algorithms. Additionally, our findings demonstrated dynamic obstacle avoidance capabilities, adeptly navigating around moving blocks. These findings highlight the potential of RL to enhance robot motion planning in the challenging and unpredictable SSL environment.

Planning the path with Reinforcement Learning: Optimal Robot Motion Planning in RoboCup Small Size League Environments

TL;DR

Problem: RL for robot motion planning in dynamic RoboCup SSL environments. Approach: a model-free SAC-based path-planning framework evaluated across Baseline, Proposed, and Obstacle RL environments, with Frame Skip and CAPS to stabilize actions. Contributions: a simplified Proposed environment, a mobile-obstacle variant, and a FSCAPS stabilization strategy, plus real-world validation. Findings: obstacle-free experiments show substantial gains including roughly time savings; obstacle tests show robust avoidance and low collision rates, and real-world trials demonstrate transferability. Significance: the results support practical, scalable RL-based path planning for fast, multi-robot SSL settings and guide Sim2Real deployment strategies.

Abstract

This work investigates the potential of Reinforcement Learning (RL) to tackle robot motion planning challenges in the dynamic RoboCup Small Size League (SSL). Using a heuristic control approach, we evaluate RL's effectiveness in obstacle-free and single-obstacle path-planning environments. Ablation studies reveal significant performance improvements. Our method achieved a 60% time gain in obstacle-free environments compared to baseline algorithms. Additionally, our findings demonstrated dynamic obstacle avoidance capabilities, adeptly navigating around moving blocks. These findings highlight the potential of RL to enhance robot motion planning in the challenging and unpredictable SSL environment.
Paper Structure (13 sections, 1 equation, 3 figures, 3 tables)

This paper contains 13 sections, 1 equation, 3 figures, 3 tables.

Figures (3)

  • Figure 1: General workflow for robot motion planning and control, divided into Global planning, local planning, and motion control. The first two stages plan the points and velocities the agent should follow, while the last controls the robot. This Work focuses on the first two modules of this workflow.
  • Figure 2: Visualization of the "Baseline", "Proposed", and "Obstacle" learning environments with example rewards. The circles blue (R), magenta (a), yellow (O), and orange (T) represent the robot's position, the action taken, the obstacle, and the target goal, respectively. The white arc and lines illustrate the angular difference, $\delta(s, a)$, and distances, $d(s, a)$, $o(s, a)$, to target and obstacle, respectively.
  • Figure 3: Trained Soft Actor-Critic agents operating within the Proposed environment across various experiments. The orange circle denotes the target goal, while the red dots trace the actions executed by the agent throughout the episode. The red line visually maps the trajectory followed by the robot. As depicted in Figure \ref{['fig:res_fscaps']}, the agent demonstrates precision by consistently reaching the target without deviation.