Table of Contents
Fetching ...

Simultaneous Localization and 3D-Semi Dense Mapping for Micro Drones Using Monocular Camera and Inertial Sensors

Jeryes Danial, Yosi Ben Asher, Itzik Klein

TL;DR

The paper tackles real-time monocular SLAM for micro-drones under tight compute constraints, where scale ambiguity and sparse maps hinder detailed perception. It proposes a lightweight edge-aware SLAM that fuses sparse epipolar geometry with dense edge-depth from a MobileNet-based FastDepth predictor, and refines this information through a local edge-aware bundle adjustment with losses for reprojection, cycle consistency, and edge shape. A key element is fusing relative visual motion with inertial data in an EKF to recover metric scale and enhance robustness, enabling real-time performance on platforms like the DJI Tello. Experimental results on the TUM RGB-D dataset show dramatic improvements over ORB-SLAM2 in APE/ATE and demonstrate robust indoor mapping and navigation capabilities on constrained hardware, highlighting practical impact for embedded autonomous systems.

Abstract

Monocular simultaneous localization and mapping (SLAM) algorithms estimate drone poses and build a 3D map using a single camera. Current algorithms include sparse methods that lack detailed geometry, while learning-driven approaches produce dense maps but are computationally intensive. Monocular SLAM also faces scale ambiguities, which affect its accuracy. To address these challenges, we propose an edge-aware lightweight monocular SLAM system combining sparse keypoint-based pose estimation with dense edge reconstruction. Our method employs deep learning-based depth prediction and edge detection, followed by optimization to refine keypoints and edges for geometric consistency, without relying on global loop closure or heavy neural computations. We fuse inertial data with vision by using an extended Kalman filter to resolve scale ambiguity and improve accuracy. The system operates in real time on low-power platforms, as demonstrated on a DJI Tello drone with a monocular camera and inertial sensors. In addition, we demonstrate robust autonomous navigation and obstacle avoidance in indoor corridors and on the TUM RGBD dataset. Our approach offers an effective, practical solution to real-time mapping and navigation in resource-constrained environments.

Simultaneous Localization and 3D-Semi Dense Mapping for Micro Drones Using Monocular Camera and Inertial Sensors

TL;DR

The paper tackles real-time monocular SLAM for micro-drones under tight compute constraints, where scale ambiguity and sparse maps hinder detailed perception. It proposes a lightweight edge-aware SLAM that fuses sparse epipolar geometry with dense edge-depth from a MobileNet-based FastDepth predictor, and refines this information through a local edge-aware bundle adjustment with losses for reprojection, cycle consistency, and edge shape. A key element is fusing relative visual motion with inertial data in an EKF to recover metric scale and enhance robustness, enabling real-time performance on platforms like the DJI Tello. Experimental results on the TUM RGB-D dataset show dramatic improvements over ORB-SLAM2 in APE/ATE and demonstrate robust indoor mapping and navigation capabilities on constrained hardware, highlighting practical impact for embedded autonomous systems.

Abstract

Monocular simultaneous localization and mapping (SLAM) algorithms estimate drone poses and build a 3D map using a single camera. Current algorithms include sparse methods that lack detailed geometry, while learning-driven approaches produce dense maps but are computationally intensive. Monocular SLAM also faces scale ambiguities, which affect its accuracy. To address these challenges, we propose an edge-aware lightweight monocular SLAM system combining sparse keypoint-based pose estimation with dense edge reconstruction. Our method employs deep learning-based depth prediction and edge detection, followed by optimization to refine keypoints and edges for geometric consistency, without relying on global loop closure or heavy neural computations. We fuse inertial data with vision by using an extended Kalman filter to resolve scale ambiguity and improve accuracy. The system operates in real time on low-power platforms, as demonstrated on a DJI Tello drone with a monocular camera and inertial sensors. In addition, we demonstrate robust autonomous navigation and obstacle avoidance in indoor corridors and on the TUM RGBD dataset. Our approach offers an effective, practical solution to real-time mapping and navigation in resource-constrained environments.

Paper Structure

This paper contains 9 sections, 30 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Pipeline for our proposed edge-aware constraint SLAM approach.
  • Figure 2: Semantic constraint in the L-shape, where intersecting lines detected by Canny edges detection (left) and a pair of and reprojected lines that intersect each other, forming approximate L shapes (right).
  • Figure 3: APE w.r.t. full transformation (unitless), (left) Edge-Aware SLAM (ours) and (right) ORB-SLAM2 (baseline)
  • Figure 4: (left) 3D map from our Edge-SLAM during initial frames in TUM and (right) 3D point cloud from ORB-SLAM2. Our method produces a clear, accurate 3D map early on, whereas ORB-SLAM2's point cloud is less interpretable at this stage.
  • Figure 5: (left) Our complete map after processing all frames, showing high accuracy and no drift and (right) ORB-SLAM2’s map, less clear and less interpretable. Our system produces a more accurate, coherent map over time, unlike ORB-SLAM2.
  • ...and 1 more figures