High-Speed Stereo Visual SLAM for Low-Powered Computing Devices

Ashish Kumar; Jaesik Park; Laxmidhar Behera

High-Speed Stereo Visual SLAM for Low-Powered Computing Devices

Ashish Kumar, Jaesik Park, Laxmidhar Behera

TL;DR

Jetson-SLAM is the fastest available accurate and GPU-accelerated SLAM system and achieves resource efficiency by having a data-sharing mechanism, and three highly accurate SLAM backends are shown.

Abstract

We present an accurate and GPU-accelerated Stereo Visual SLAM design called Jetson-SLAM. It exhibits frame-processing rates above 60FPS on NVIDIA's low-powered 10W Jetson-NX embedded computer and above 200FPS on desktop-grade 200W GPUs, even in stereo configuration and in the multiscale setting. Our contributions are threefold: (i) a Bounded Rectification technique to prevent tagging many non-corner points as a corner in FAST detection, improving SLAM accuracy. (ii) A novel Pyramidal Culling and Aggregation (PyCA) technique that yields robust features while suppressing redundant ones at high speeds by harnessing a GPU device. PyCA uses our new Multi-Location Per Thread culling strategy (MLPT) and Thread-Efficient Warp-Allocation (TEWA) scheme for GPU to enable Jetson-SLAM achieving high accuracy and speed on embedded devices. (iii) Jetson-SLAM library achieves resource efficiency by having a data-sharing mechanism. Our experiments on three challenging datasets: KITTI, EuRoC, and KAIST-VIO, and two highly accurate SLAM backends: Full-BA and ICE-BA show that Jetson-SLAM is the fastest available accurate and GPU-accelerated SLAM system (Fig. 1).

High-Speed Stereo Visual SLAM for Low-Powered Computing Devices

TL;DR

Jetson-SLAM is the fastest available accurate and GPU-accelerated SLAM system and achieves resource efficiency by having a data-sharing mechanism, and three highly accurate SLAM backends are shown.

Abstract

Paper Structure (41 sections, 5 equations, 13 figures, 9 tables)

This paper contains 41 sections, 5 equations, 13 figures, 9 tables.

Introduction
Bounded Rectification
Pyramidal Culling and Aggregation (PyCA)
Frontend--Middle-end--Backend Design of Jetson-SLAM
Methodology
Bounded Rectification for Corner Detection
GPU Fundamentals
Pyramidal Culling and Aggregation (PyCA)
Feature Culling (FC)
Vertical Feature Culling
Horizontal Feature Culling
Thread Efficient Warp-Allocation (TEWA)
Pyramidal Feature Aggregation (PFA)
System Integration
$\mu$-Sec. Efficient FAST Detection
...and 26 more sections

Figures (13)

Figure 1: (a) Output of Jetson-SLAM's GPU-accelerated and resource-efficient Frontend--Middle-end design, (b) the output trajectory, (c) Frames-Per-Second benchmarking on Jetson-NX embedded computer, and (d) SLAM performance on a KITTI sequence.
Figure 2: (a) A non-corner but all bright pixels on its Bresenhem circle fast, (b) Real-image examples of such points tagged as a corner. Each rectangle () denotes a pixel in a $7\times 7$ patch of the image.
Figure 3: Feature Culling (FC) for a $6 \times 5$ cell. T$_{ij}$ is a CUDA-thread of CUDA kernel cudaguide. A '' indicates the corner strength of a pixel.
Figure 4: (a) coalesced, and (b) non-coalesced memory access. '' denotes contiguous memory block, and a '' denotes $i^{th}$ warp thread. In coalesced access, $32$ threads read in one machine-cycle, whereas in non-coalesced access, the memory transactions are serialized cudaguide.
Figure 5: Illustration of FC + (TEWA) scheme. FC is applied over the CRF-Matrix which produces the strongest corner in a cell. A '$C_i$' is a cell, and a '' and a '' denote a working and an idle/wasted thread in a warp respectively for a $3\times3$ cell-size.
...and 8 more figures

High-Speed Stereo Visual SLAM for Low-Powered Computing Devices

TL;DR

Abstract

High-Speed Stereo Visual SLAM for Low-Powered Computing Devices

Authors

TL;DR

Abstract

Table of Contents

Figures (13)