Table of Contents
Fetching ...

Embodied Escaping: End-to-End Reinforcement Learning for Robot Navigation in Narrow Environment

Han Zheng, Jiale Zhang, Mingyang Jiang, Peiyuan Liu, Danni Liu, Tong Qin, Ming Yang

TL;DR

This work targets the dead-zone problem in indoor robot navigation by proposing an embodied escaping approach that maps raw LiDAR and inertial data directly to control commands in an end-to-end, map-free setting. It couples a transformer-augmented SAC policy with a fast action mask and a 42-action discrete space, and introduces a hybrid training policy that leverages A* guidance during training to address sparse rewards. The main contributions are an efficient action representation with a precomputed action mask, a hybrid RL–planning training regime, and real-world validation showing superior escape success and collision avoidance versus traditional planners and baselines. The results demonstrate strong generalization to unseen, dynamic, and narrow environments, indicating practical applicability for autonomous cleaning robots in cluttered homes.

Abstract

Autonomous navigation is a fundamental task for robot vacuum cleaners in indoor environments. Since their core function is to clean entire areas, robots inevitably encounter dead zones in cluttered and narrow scenarios. Existing planning methods often fail to escape due to complex environmental constraints, high-dimensional search spaces, and high difficulty maneuvers. To address these challenges, this paper proposes an embodied escaping model that leverages reinforcement learning-based policy with an efficient action mask for dead zone escaping. To alleviate the issue of the sparse reward in training, we introduce a hybrid training policy that improves learning efficiency. In handling redundant and ineffective action options, we design a novel action representation to reshape the discrete action space with a uniform turning radius. Furthermore, we develop an action mask strategy to select valid action quickly, balancing precision and efficiency. In real-world experiments, our robot is equipped with a Lidar, IMU, and two-wheel encoders. Extensive quantitative and qualitative experiments across varying difficulty levels demonstrate that our robot can consistently escape from challenging dead zones. Moreover, our approach significantly outperforms compared path planning and reinforcement learning methods in terms of success rate and collision avoidance.

Embodied Escaping: End-to-End Reinforcement Learning for Robot Navigation in Narrow Environment

TL;DR

This work targets the dead-zone problem in indoor robot navigation by proposing an embodied escaping approach that maps raw LiDAR and inertial data directly to control commands in an end-to-end, map-free setting. It couples a transformer-augmented SAC policy with a fast action mask and a 42-action discrete space, and introduces a hybrid training policy that leverages A* guidance during training to address sparse rewards. The main contributions are an efficient action representation with a precomputed action mask, a hybrid RL–planning training regime, and real-world validation showing superior escape success and collision avoidance versus traditional planners and baselines. The results demonstrate strong generalization to unseen, dynamic, and narrow environments, indicating practical applicability for autonomous cleaning robots in cluttered homes.

Abstract

Autonomous navigation is a fundamental task for robot vacuum cleaners in indoor environments. Since their core function is to clean entire areas, robots inevitably encounter dead zones in cluttered and narrow scenarios. Existing planning methods often fail to escape due to complex environmental constraints, high-dimensional search spaces, and high difficulty maneuvers. To address these challenges, this paper proposes an embodied escaping model that leverages reinforcement learning-based policy with an efficient action mask for dead zone escaping. To alleviate the issue of the sparse reward in training, we introduce a hybrid training policy that improves learning efficiency. In handling redundant and ineffective action options, we design a novel action representation to reshape the discrete action space with a uniform turning radius. Furthermore, we develop an action mask strategy to select valid action quickly, balancing precision and efficiency. In real-world experiments, our robot is equipped with a Lidar, IMU, and two-wheel encoders. Extensive quantitative and qualitative experiments across varying difficulty levels demonstrate that our robot can consistently escape from challenging dead zones. Moreover, our approach significantly outperforms compared path planning and reinforcement learning methods in terms of success rate and collision avoidance.

Paper Structure

This paper contains 17 sections, 6 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: (a) shows an escaping process for robot vacuum cleaners. (b) shows the basic idea of the embodied escaping model, which directly maps Lidar and IMU data to action. The end-to-end model is trained by the hybrid policy in a random simulator.
  • Figure 2: The structure of the proposed embodied escaping model. The observation of the state space is elaborated in Sec. \ref{['sec: problem']}. Then the RL network is based on a SAC model with a transformer encoder. The hybrid training policy is illustrated in Sec. \ref{['sec: hybrid']}, which is trained in a 2D simulator. An efficient action mask is introduced in Sec. \ref{['sec: action_mask']}.
  • Figure 3: In (a), we set the inflated radius to the robot's circumscribed circle radius to ensure collision-free navigation. In (b), the 2D path, initially without consideration of the heading angles, is transformed into a sequence of in-place rotations and straight-line movements.
  • Figure 4: In (a), we applied proportional dimensionality reduction to the original action space, restructuring it into a discrete space with uniform turning radius. In (b), The action mask rapidly distinguishes the valid actions and invalid actions. In (c), collision is detected when the distance of the Lidar point is closer than that of the boundary point.
  • Figure 5: The visualization of the escaping process. The starting boundary is highlighted in red, the boundary of the final state is highlighted in blue, and obstacles are filled in black. Red-to-blue gradient rectangles represent the intermediate states explored during the escaping process. In (a)-(d), compared methods fail to escape for collision or endless exploration. In (e)-(h), our approach generates a reasonable trajectory in challenging feature-combination scenarios.
  • ...and 2 more figures