Table of Contents
Fetching ...

Semantic Landmark Detection & Classification Using Neural Networks For 3D In-Air Sonar

Wouter Jansen, Jan Steckel

TL;DR

Approach addresses robust landmark-based SLAM in harsh environments where optical sensing fails. It uses a CNN that ingests cochleogram representations of echoes from the in-air eRTIS, jointly performing ten-way landmark classification and orientation regression. Key results show high test accuracy for landmark classification and an RMSE around 9.15 degrees for orientation, with near-100% detection of empty scenes. The work demonstrates that semantically defined acoustic landmarks can significantly improve SLAM reliability and autonomous navigation in challenging conditions.

Abstract

In challenging environments where traditional sensing modalities struggle, in-air sonar offers resilience to optical interference. Placing a priori known landmarks in these environments can eliminate accumulated errors in autonomous mobile systems such as Simultaneous Localization and Mapping (SLAM) and autonomous navigation. We present a novel approach using a convolutional neural network to detect and classify ten different reflector landmarks with varying radii using in-air 3D sonar. Additionally, the network predicts the orientation angle of the detected landmarks. The neural network is trained on cochleograms, representing echoes received by the sensor in a time-frequency domain. Experimental results in cluttered indoor settings show promising performance. The CNN achieves a 97.3% classification accuracy on the test dataset, accurately detecting both the presence and absence of landmarks. Moreover, the network predicts landmark orientation angles with an RMSE lower than 10 degrees, enhancing the utility in SLAM and autonomous navigation applications. This advancement improves the robustness and accuracy of autonomous systems in challenging environments.

Semantic Landmark Detection & Classification Using Neural Networks For 3D In-Air Sonar

TL;DR

Approach addresses robust landmark-based SLAM in harsh environments where optical sensing fails. It uses a CNN that ingests cochleogram representations of echoes from the in-air eRTIS, jointly performing ten-way landmark classification and orientation regression. Key results show high test accuracy for landmark classification and an RMSE around 9.15 degrees for orientation, with near-100% detection of empty scenes. The work demonstrates that semantically defined acoustic landmarks can significantly improve SLAM reliability and autonomous navigation in challenging conditions.

Abstract

In challenging environments where traditional sensing modalities struggle, in-air sonar offers resilience to optical interference. Placing a priori known landmarks in these environments can eliminate accumulated errors in autonomous mobile systems such as Simultaneous Localization and Mapping (SLAM) and autonomous navigation. We present a novel approach using a convolutional neural network to detect and classify ten different reflector landmarks with varying radii using in-air 3D sonar. Additionally, the network predicts the orientation angle of the detected landmarks. The neural network is trained on cochleograms, representing echoes received by the sensor in a time-frequency domain. Experimental results in cluttered indoor settings show promising performance. The CNN achieves a 97.3% classification accuracy on the test dataset, accurately detecting both the presence and absence of landmarks. Moreover, the network predicts landmark orientation angles with an RMSE lower than 10 degrees, enhancing the utility in SLAM and autonomous navigation applications. This advancement improves the robustness and accuracy of autonomous systems in challenging environments.
Paper Structure (7 sections, 5 figures)

This paper contains 7 sections, 5 figures.

Figures (5)

  • Figure 1: Drawing of a reflector and its dish-shape extracted from a sphere with radius $d$ cut off by a percentage factor $c$. Furthermore, the rotation $\gamma$ is shown. The eRTIS sensor is always assumed to be parallel to the origin of the reflector landmark ($\gamma=0^{\circ}$). The azimuth angle $\theta$ and range $r$ that the sensor observes the reflection at after beamforming is also illustrated. The top right photograph shows the real measurement setup with the pan-tilt device that controls the azimuth and elevation angles of the 3D-printed landmark.
  • Figure 2: The pre-processing steps for creating the datasets from the microphone-array signals.
  • Figure 3: The Neural-network architecture from the input cochleogram for identifying the landmark reflectors and estimating its orientation using a classification and regression network, respectively. These two network outputs share a common architecture of three convolutional layers as well as two Fully Connected Layers (FCL). Batch normalization and rectifier layers were also used but were omitted from the diagram for readability.
  • Figure 4: The landmark classification results for ten different landmark sizes and no landmark in the echo in the form of a confusion matrix.
  • Figure 5: The functional box plot results of the regression for determining the azimuth angle of the landmark.