Table of Contents
Fetching ...

End-to-end Generative Spatial-Temporal Ultrasonic Odometry and Mapping Framework

Fuhua Jia, Xiaoying Yang, Mengshen Yang, Yang Li, Hang Xu, Adam Rushworth, Salman Ijaz, Heng Yu, Tianxiang Cui

TL;DR

This work tackles SLAM in smoke, dust, and other low-visibility environments where traditional sensors underperform. It introduces EGST-UOAM, an end-to-end generative framework that spatially encodes the scene with a 12-sensor ultrasonic array featuring overlapping fields of view and temporally encodes data with a sliding window, processing it through a transformer to generate dense scans and a CNN to estimate motion. The approach delivers real-time updates of maps and odometry at the sensor frequency and demonstrates feasibility through real-world experiments, showing competitive obstacle representation despite the ultrasonic modality’s limitations. Overall, the method offers a practical ultrasonic SLAM solution for challenging environments with potential impact on robotics operating in smoke, dust, and similar conditions.

Abstract

Performing simultaneous localization and mapping (SLAM) in low-visibility conditions, such as environments filled with smoke, dust and transparent objets, has long been a challenging task. Sensors like cameras and Light Detection and Ranging (LiDAR) are significantly limited under these conditions, whereas ultrasonic sensors offer a more robust alternative. However, the low angular resolution, slow update frequency, and limited detection accuracy of ultrasonic sensors present barriers for SLAM. In this work, we propose a novel end-to-end generative ultrasonic SLAM framework. This framework employs a sensor array with overlapping fields of view, leveraging the inherently low angular resolution of ultrasonic sensors to implicitly encode spatial features in conjunction with the robot's motion. Consecutive time frame data is processed through a sliding window mechanism to capture temporal features. The spatiotemporally encoded sensor data is passed through multiple modules to generate dense scan point clouds and robot pose transformations for map construction and odometry. The main contributions of this work include a novel ultrasonic sensor array that spatiotemporally encodes the surrounding environment, and an end-to-end generative SLAM framework that overcomes the inherent defects of ultrasonic sensors. Several real-world experiments demonstrate the feasibility and robustness of the proposed framework.

End-to-end Generative Spatial-Temporal Ultrasonic Odometry and Mapping Framework

TL;DR

This work tackles SLAM in smoke, dust, and other low-visibility environments where traditional sensors underperform. It introduces EGST-UOAM, an end-to-end generative framework that spatially encodes the scene with a 12-sensor ultrasonic array featuring overlapping fields of view and temporally encodes data with a sliding window, processing it through a transformer to generate dense scans and a CNN to estimate motion. The approach delivers real-time updates of maps and odometry at the sensor frequency and demonstrates feasibility through real-world experiments, showing competitive obstacle representation despite the ultrasonic modality’s limitations. Overall, the method offers a practical ultrasonic SLAM solution for challenging environments with potential impact on robotics operating in smoke, dust, and similar conditions.

Abstract

Performing simultaneous localization and mapping (SLAM) in low-visibility conditions, such as environments filled with smoke, dust and transparent objets, has long been a challenging task. Sensors like cameras and Light Detection and Ranging (LiDAR) are significantly limited under these conditions, whereas ultrasonic sensors offer a more robust alternative. However, the low angular resolution, slow update frequency, and limited detection accuracy of ultrasonic sensors present barriers for SLAM. In this work, we propose a novel end-to-end generative ultrasonic SLAM framework. This framework employs a sensor array with overlapping fields of view, leveraging the inherently low angular resolution of ultrasonic sensors to implicitly encode spatial features in conjunction with the robot's motion. Consecutive time frame data is processed through a sliding window mechanism to capture temporal features. The spatiotemporally encoded sensor data is passed through multiple modules to generate dense scan point clouds and robot pose transformations for map construction and odometry. The main contributions of this work include a novel ultrasonic sensor array that spatiotemporally encodes the surrounding environment, and an end-to-end generative SLAM framework that overcomes the inherent defects of ultrasonic sensors. Several real-world experiments demonstrate the feasibility and robustness of the proposed framework.

Paper Structure

This paper contains 11 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Angular resolution enhancement process, for example at a specific distance. The robot's sensor aligned with the x-axis is designated as sensor $1$, with subsequent sensors numbered in a clockwise direction, culminating in sensor $10$, which aligns with the y-axis. As the robot rotates, sensors $12$, $1$, and $2$ are the first to detect obstacles. The overlapping and non-overlapping detection areas of sensor $1$, along with sensors $12$ and $2$, encompass the obstacle, while the other two sensors only cover the overlapping regions. During the robot's rotation, sensor $12$ is the first to disengage from the obstacle, followed by the engagement of sensor $3$. Throughout this process, the overlapping and non-overlapping areas of each sensor gradually sweep across the obstacle, revealing that spatial features are embedded within the detection results. If sensor data is processed continuously over time, these hidden spatial features can be reconstructed.
  • Figure 2: System overview of proposed EGST-UOAM framework at multiple timestamps. The figure illustrates how the robot spatially encodes in continuous states and how the proposed framework temporally encodes and processes the spatially encoded data. At time $T_x$, the robot is on a random position $(a)$, and the data captured by the ultrasonic sensor array forms a $1 \times 12$ vector. As the robot moves to position $(b)$ and reaches to position $(c)$ at several time frames later, the overlapping and non-overlapping detection areas of each sensor in the array gradually sweep over obstacles, encoding spatial features into the ultrasonic detection data. The colors represent the distances of obstacles detected by individual sensors in the array. The ultrasonic sensors detect only the closest obstacles within their FOV. At each timestamp, the proposed framework first incorporates the ultrasonic sensor data from the current timestamp into the sliding window for time encoding. The sliding window data then passes through the transformer module for scan point cloud enhancement. The enhanced point cloud, along with the scan from the previous timestamp and the sliding window data, is subsequently fed into the CNN module to estimate the robot's transformation relative to its previous position.
  • Figure 3: Robot setup used in the experiment. The onboard PC uses an AMD 8945HS CPU for framework inference. We used an omnidirectional mobile robot platform, 2-D Lidar to provide reliable scan point clouds, and a Nokov motion capture system to provide odometer annotations.
  • Figure 4: Comparison of the scan quality of the proposed EGST-UOAM framework and lidar. Series I indicates that there is only one obstacle in the scene, and series II indicates two arbitrary obstacles in the scene. Orange represents the raw ultrasonic array data at current timestamp, blue is the predicted scan point cloud, and green is the laser result. From the results, it can be seen that the proposed framework better restores features such as walls and obstacle positions.