Table of Contents
Fetching ...

UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments

Ahmed Radwan, Ali Tourani, Hriday Bavle, Holger Voos, Jose Luis Sanchez-Lopez

TL;DR

This work presents an end-to-end UAV system that fuses marker-based Visual SLAM (VSLAM) with an RGB-D sensor to reconstruct indoor maps and generate multi-layered 3D scene graphs in GPS-denied environments. Built on ORB-SLAM3 with ArUco marker detection, the framework augments SLAM with semantic entities such as walls, doors, rooms, and corridors, enabling hierarchical scene representations. Hardware integration includes a Mikrokopter2 drone, RealSense D435, an onboard Intel NUC, and ROS Noetic, validating online and offline operation in real-world indoor scenarios. Experimental results show that the marker-augmented VSLAM provides competitive map accuracy and richer scene graphs compared to a baseline, with practical benefits for situational awareness and potential for localization/navigation enhancements in future work.

Abstract

Aerial robots play a vital role in various applications where the situational awareness of the robots concerning the environment is a fundamental demand. As one such use case, drones in GPS-denied environments require equipping with different sensors (e.g., vision sensors) that provide reliable sensing results while performing pose estimation and localization. In this paper, reconstructing the maps of indoor environments alongside generating 3D scene graphs for a high-level representation using a camera mounted on a drone is targeted. Accordingly, an aerial robot equipped with a companion computer and an RGB-D camera was built and employed to be appropriately integrated with a Visual Simultaneous Localization and Mapping (VSLAM) framework proposed by the authors. To enhance the situational awareness of the robot while reconstructing maps, various structural elements, including doors and walls, were labeled with printed fiducial markers, and a dictionary of the topological relations among them was fed to the system. The VSLAM system detects markers and reconstructs the map of the indoor areas enriched with higher-level semantic entities, including corridors and rooms. Another achievement is generating multi-layered vision-based situational graphs containing enhanced hierarchical representations of the indoor environment. In this regard, integrating VSLAM into the employed drone is the primary target of this paper to provide an end-to-end robot application for GPS-denied environments. To show the practicality of the system, various real-world condition experiments have been conducted in indoor scenarios with dissimilar structural layouts. Evaluations show the proposed drone application can perform adequately w.r.t. the ground-truth data and its baseline.

UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments

TL;DR

This work presents an end-to-end UAV system that fuses marker-based Visual SLAM (VSLAM) with an RGB-D sensor to reconstruct indoor maps and generate multi-layered 3D scene graphs in GPS-denied environments. Built on ORB-SLAM3 with ArUco marker detection, the framework augments SLAM with semantic entities such as walls, doors, rooms, and corridors, enabling hierarchical scene representations. Hardware integration includes a Mikrokopter2 drone, RealSense D435, an onboard Intel NUC, and ROS Noetic, validating online and offline operation in real-world indoor scenarios. Experimental results show that the marker-augmented VSLAM provides competitive map accuracy and richer scene graphs compared to a baseline, with practical benefits for situational awareness and potential for localization/navigation enhancements in future work.

Abstract

Aerial robots play a vital role in various applications where the situational awareness of the robots concerning the environment is a fundamental demand. As one such use case, drones in GPS-denied environments require equipping with different sensors (e.g., vision sensors) that provide reliable sensing results while performing pose estimation and localization. In this paper, reconstructing the maps of indoor environments alongside generating 3D scene graphs for a high-level representation using a camera mounted on a drone is targeted. Accordingly, an aerial robot equipped with a companion computer and an RGB-D camera was built and employed to be appropriately integrated with a Visual Simultaneous Localization and Mapping (VSLAM) framework proposed by the authors. To enhance the situational awareness of the robot while reconstructing maps, various structural elements, including doors and walls, were labeled with printed fiducial markers, and a dictionary of the topological relations among them was fed to the system. The VSLAM system detects markers and reconstructs the map of the indoor areas enriched with higher-level semantic entities, including corridors and rooms. Another achievement is generating multi-layered vision-based situational graphs containing enhanced hierarchical representations of the indoor environment. In this regard, integrating VSLAM into the employed drone is the primary target of this paper to provide an end-to-end robot application for GPS-denied environments. To show the practicality of the system, various real-world condition experiments have been conducted in indoor scenarios with dissimilar structural layouts. Evaluations show the proposed drone application can perform adequately w.r.t. the ground-truth data and its baseline.
Paper Structure (14 sections, 7 figures, 3 tables)

This paper contains 14 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Reconstructing the map of an indoor environment, along with its 3D hierarchical representation and detected semantic entities generated by integrating a marker-based VSLAM framework vsgraphs on a drone.
  • Figure 2: The primary system components and pipeline of the VSLAM framework used in this research vsgraphs.
  • Figure 3: The drone used in the research for integration with the VSLAM framework and experiments.
  • Figure 4: The reference frames considered in the current framework for perception calculations.
  • Figure 5: Some instances of the collected dataset, where ArUco markers are used to label walls and door frames of the environments.
  • ...and 2 more figures