Table of Contents
Fetching ...

CTSAC: Curriculum-Based Transformer Soft Actor-Critic for Goal-Oriented Robot Exploration

Chunyu Yang, Shengben Bi, Yihui Xu, Xin Zhang

TL;DR

This paper tackles efficient goal-oriented autonomous robot exploration under partial observability and sim-to-real transfer challenges. It introduces CTSAC, a Transformer-based extension of Soft Actor-Critic augmented with a periodic review-based curriculum and direction-aware LiDAR clustering, implemented on a ROS-Gazebo-PyTorch platform. Key contributions include leveraging historical context for better decision-making, a curriculum strategy to accelerate training while mitigating forgetting, and LiDAR preprocessing to narrow the sim-to-real gap, with successful validation in both simulation and real-world tests. Results show CTSAC achieves higher success rates and shorter exploration times than strong baselines and demonstrates strong transfer from simulation to the real world.

Abstract

With the increasing demand for efficient and flexible robotic exploration solutions, Reinforcement Learning (RL) is becoming a promising approach in the field of autonomous robotic exploration. However, current RL-based exploration algorithms often face limited environmental reasoning capabilities, slow convergence rates, and substantial challenges in Sim-To-Real (S2R) transfer. To address these issues, we propose a Curriculum Learning-based Transformer Reinforcement Learning Algorithm (CTSAC) aimed at improving both exploration efficiency and transfer performance. To enhance the robot's reasoning ability, a Transformer is integrated into the perception network of the Soft Actor-Critic (SAC) framework, leveraging historical information to improve the farsightedness of the strategy. A periodic review-based curriculum learning is proposed, which enhances training efficiency while mitigating catastrophic forgetting during curriculum transitions. Training is conducted on the ROS-Gazebo continuous robotic simulation platform, with LiDAR clustering optimization to further reduce the S2R gap. Experimental results demonstrate the CTSAC algorithm outperforms the state-of-the-art non-learning and learning-based algorithms in terms of success rate and success rate-weighted exploration time. Moreover, real-world experiments validate the strong S2R transfer capabilities of CTSAC.

CTSAC: Curriculum-Based Transformer Soft Actor-Critic for Goal-Oriented Robot Exploration

TL;DR

This paper tackles efficient goal-oriented autonomous robot exploration under partial observability and sim-to-real transfer challenges. It introduces CTSAC, a Transformer-based extension of Soft Actor-Critic augmented with a periodic review-based curriculum and direction-aware LiDAR clustering, implemented on a ROS-Gazebo-PyTorch platform. Key contributions include leveraging historical context for better decision-making, a curriculum strategy to accelerate training while mitigating forgetting, and LiDAR preprocessing to narrow the sim-to-real gap, with successful validation in both simulation and real-world tests. Results show CTSAC achieves higher success rates and shorter exploration times than strong baselines and demonstrates strong transfer from simulation to the real world.

Abstract

With the increasing demand for efficient and flexible robotic exploration solutions, Reinforcement Learning (RL) is becoming a promising approach in the field of autonomous robotic exploration. However, current RL-based exploration algorithms often face limited environmental reasoning capabilities, slow convergence rates, and substantial challenges in Sim-To-Real (S2R) transfer. To address these issues, we propose a Curriculum Learning-based Transformer Reinforcement Learning Algorithm (CTSAC) aimed at improving both exploration efficiency and transfer performance. To enhance the robot's reasoning ability, a Transformer is integrated into the perception network of the Soft Actor-Critic (SAC) framework, leveraging historical information to improve the farsightedness of the strategy. A periodic review-based curriculum learning is proposed, which enhances training efficiency while mitigating catastrophic forgetting during curriculum transitions. Training is conducted on the ROS-Gazebo continuous robotic simulation platform, with LiDAR clustering optimization to further reduce the S2R gap. Experimental results demonstrate the CTSAC algorithm outperforms the state-of-the-art non-learning and learning-based algorithms in terms of success rate and success rate-weighted exploration time. Moreover, real-world experiments validate the strong S2R transfer capabilities of CTSAC.

Paper Structure

This paper contains 21 sections, 9 equations, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: CTSAC-based autonomous robot exploration system. The robot perceives its environment using LiDAR and IMU sensors, and makes decisions through the CTSAC algorithm, enabling efficient exploration the environment.
  • Figure 2: Overview of the CTSAC autonomous exploration system architecture
  • Figure 3: LiDAR clustering optimization regarding robot direction. The figure illustrates an example with $d$ = 32.
  • Figure 4: SAC architecture diagram.
  • Figure 5: Transformer-based architecture for SAC network. The $FC$ denotes a fully connected layer, while $\text{Mean}(n)$ and $\text{Cat}(n)$ represent dimension-wise averaging and concatenation operations along $n$ dimensions, respectively. The batch parameters include $B$ (batch size) and $T$ (sequence length per sample). For feature dimensions, $D_s$ and $D_a$ correspond to the state feature dimension and action feature dimension, respectively.
  • ...and 3 more figures