Table of Contents
Fetching ...

Spatiotemporal Attention Enhances Lidar-Based Robot Navigation in Dynamic Environments

Jorge de Heuvel, Xiangyu Zeng, Weixian Shi, Tharun Sethuraman, Maren Bennewitz

TL;DR

A spatiotemporal attention pipeline for enhanced navigation based on 2D lidar sensor readings is introduced, complemented by a novel lidar-state representation that emphasizes dynamic obstacles over static ones, resulting in improved overall navigation performance within dynamic scenarios.

Abstract

Foresighted robot navigation in dynamic indoor environments with cost-efficient hardware necessitates the use of a lightweight yet dependable controller. So inferring the scene dynamics from sensor readings without explicit object tracking is a pivotal aspect of foresighted navigation among pedestrians. In this paper, we introduce a spatiotemporal attention pipeline for enhanced navigation based on 2D~lidar sensor readings. This pipeline is complemented by a novel lidar-state representation that emphasizes dynamic obstacles over static ones. Subsequently, the attention mechanism enables selective scene perception across both space and time, resulting in improved overall navigation performance within dynamic scenarios. We thoroughly evaluated the approach in different scenarios and simulators, finding excellent generalization to unseen environments. The results demonstrate outstanding performance compared to state-of-the-art methods, thereby enabling the seamless deployment of the learned controller on a real robot.

Spatiotemporal Attention Enhances Lidar-Based Robot Navigation in Dynamic Environments

TL;DR

A spatiotemporal attention pipeline for enhanced navigation based on 2D lidar sensor readings is introduced, complemented by a novel lidar-state representation that emphasizes dynamic obstacles over static ones, resulting in improved overall navigation performance within dynamic scenarios.

Abstract

Foresighted robot navigation in dynamic indoor environments with cost-efficient hardware necessitates the use of a lightweight yet dependable controller. So inferring the scene dynamics from sensor readings without explicit object tracking is a pivotal aspect of foresighted navigation among pedestrians. In this paper, we introduce a spatiotemporal attention pipeline for enhanced navigation based on 2D~lidar sensor readings. This pipeline is complemented by a novel lidar-state representation that emphasizes dynamic obstacles over static ones. Subsequently, the attention mechanism enables selective scene perception across both space and time, resulting in improved overall navigation performance within dynamic scenarios. We thoroughly evaluated the approach in different scenarios and simulators, finding excellent generalization to unseen environments. The results demonstrate outstanding performance compared to state-of-the-art methods, thereby enabling the seamless deployment of the learned controller on a real robot.
Paper Structure (29 sections, 2 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 29 sections, 2 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Our pipeline for learning a robot navigation controller based on lidar. Two attention mechanisms reason about the importance of individual lidar sectors with respect to known and unknown dynamic obstacles. Our Temporal Accumulation Group Descriptors (TAGD) reveal moving obstacles from subsequent lidar scans affected by robot self-motion.
  • Figure 2: Schematic of the TAGD generation process. The ICP alignment of two subsequent lidar scans (1) in 2D Cartesian coordinates reduces the effect of robot self-movement (2). This allows better differentiation between dynamic obstacles and static obstacles. The aligned scan is grouped and clustered around ray-cast centers (3). From the clustered points (4), the position difference of the centroid from both time steps reveals a moving obstacle (5).
  • Figure 3: Illustration of our architecture. a) The indoor environment provides lidar readings to the deep reinforcement learning agent that drives a differential-wheeled robot via linear and angular velocity commands. b) From subsequent lidar readings, the TAGDs are computed. Merged with the five upcoming waypoints of the global path and the raw lidar readings as observations, they are c) processed by the agent in a separate spatial and temporal stream. Both streams feature an attention block to weigh the importance of d) individual lidar sectors (spatial) or e) the TAGDs (temporal), with respect to the upcoming waypoints. After feature extraction, both streams are concatenated for further processing in the final output network of the actor-critic agent.
  • Figure 4: The Pybullet-based environments of de_heuvel_subgoal-driven_2023 are used for training. a) In the corridor and b) intersection environment, the wall distances are randomized (blue). c) In the office environment, the outer walls are fixed with randomized inner wall placement for diverse room setups.
  • Figure 5: Performance overview for all approaches averaged over 1,000 episodes with identical scene setups in all three Pybullet environments for a) increasing obstacle speeds, with two dynamic and one static pedestrians, and b) increasing number of obstacles, with a fixed pedestrian speed $0.6 m\per s$.
  • ...and 2 more figures