End-to-end Generative Spatial-Temporal Ultrasonic Odometry and Mapping Framework
Fuhua Jia, Xiaoying Yang, Mengshen Yang, Yang Li, Hang Xu, Adam Rushworth, Salman Ijaz, Heng Yu, Tianxiang Cui
TL;DR
This work tackles SLAM in smoke, dust, and other low-visibility environments where traditional sensors underperform. It introduces EGST-UOAM, an end-to-end generative framework that spatially encodes the scene with a 12-sensor ultrasonic array featuring overlapping fields of view and temporally encodes data with a sliding window, processing it through a transformer to generate dense scans and a CNN to estimate motion. The approach delivers real-time updates of maps and odometry at the sensor frequency and demonstrates feasibility through real-world experiments, showing competitive obstacle representation despite the ultrasonic modality’s limitations. Overall, the method offers a practical ultrasonic SLAM solution for challenging environments with potential impact on robotics operating in smoke, dust, and similar conditions.
Abstract
Performing simultaneous localization and mapping (SLAM) in low-visibility conditions, such as environments filled with smoke, dust and transparent objets, has long been a challenging task. Sensors like cameras and Light Detection and Ranging (LiDAR) are significantly limited under these conditions, whereas ultrasonic sensors offer a more robust alternative. However, the low angular resolution, slow update frequency, and limited detection accuracy of ultrasonic sensors present barriers for SLAM. In this work, we propose a novel end-to-end generative ultrasonic SLAM framework. This framework employs a sensor array with overlapping fields of view, leveraging the inherently low angular resolution of ultrasonic sensors to implicitly encode spatial features in conjunction with the robot's motion. Consecutive time frame data is processed through a sliding window mechanism to capture temporal features. The spatiotemporally encoded sensor data is passed through multiple modules to generate dense scan point clouds and robot pose transformations for map construction and odometry. The main contributions of this work include a novel ultrasonic sensor array that spatiotemporally encodes the surrounding environment, and an end-to-end generative SLAM framework that overcomes the inherent defects of ultrasonic sensors. Several real-world experiments demonstrate the feasibility and robustness of the proposed framework.
