Table of Contents
Fetching ...

MonoSpheres: Large-Scale Monocular SLAM-Based UAV Exploration through Perception-Coupled Mapping and Planning

Tomáš Musil, Matěj Petrlík, Martin Saska

TL;DR

This work tackles monocular UAV exploration in unknown 3D environments by introducing a perception-coupled mapping and planning framework. It constructs a sphere-based map from sparse monocular SLAM points, augmented with depth interpolation (OVDE) and obstacle filtering (DBOF) to robustly represent free space and obstacles without dense sensors or GPUs. Frontiers are sampled directly on the free-space polyhedron, and exploration viewpoints are chosen to ensure sufficient translational motion for parallax-based depth estimation, improving safety and scalability. Extensive real-world and simulated experiments, including ablations, demonstrate large-scale indoor and outdoor exploration capabilities, supported by open-source code for future research.

Abstract

Autonomous exploration of unknown environments is a key capability for mobile robots, but it is largely unsolved for robots equipped with only a single monocular camera and no dense range sensors. In this paper, we present a novel approach to monocular vision-based exploration that can safely cover large-scale unstructured indoor and outdoor 3D environments by explicitly accounting for the properties of a sparse monocular SLAM frontend in both mapping and planning. The mapping module solves the problems of sparse depth data, free-space gaps, and large depth uncertainty by oversampling free space in texture-sparse areas and keeping track of obstacle position uncertainty. The planning module handles the added free-space uncertainty through rapid replanning and perception-aware heading control. We further show that frontier-based exploration is possible with sparse monocular depth data when parallax requirements and the possibility of textureless surfaces are taken into account. We evaluate our approach extensively in diverse real-world and simulated environments, including ablation studies. To the best of the authors' knowledge, the proposed method is the first to achieve 3D monocular exploration in real-world unstructured outdoor environments. We open-source our implementation to support future research.

MonoSpheres: Large-Scale Monocular SLAM-Based UAV Exploration through Perception-Coupled Mapping and Planning

TL;DR

This work tackles monocular UAV exploration in unknown 3D environments by introducing a perception-coupled mapping and planning framework. It constructs a sphere-based map from sparse monocular SLAM points, augmented with depth interpolation (OVDE) and obstacle filtering (DBOF) to robustly represent free space and obstacles without dense sensors or GPUs. Frontiers are sampled directly on the free-space polyhedron, and exploration viewpoints are chosen to ensure sufficient translational motion for parallax-based depth estimation, improving safety and scalability. Extensive real-world and simulated experiments, including ablations, demonstrate large-scale indoor and outdoor exploration capabilities, supported by open-source code for future research.

Abstract

Autonomous exploration of unknown environments is a key capability for mobile robots, but it is largely unsolved for robots equipped with only a single monocular camera and no dense range sensors. In this paper, we present a novel approach to monocular vision-based exploration that can safely cover large-scale unstructured indoor and outdoor 3D environments by explicitly accounting for the properties of a sparse monocular SLAM frontend in both mapping and planning. The mapping module solves the problems of sparse depth data, free-space gaps, and large depth uncertainty by oversampling free space in texture-sparse areas and keeping track of obstacle position uncertainty. The planning module handles the added free-space uncertainty through rapid replanning and perception-aware heading control. We further show that frontier-based exploration is possible with sparse monocular depth data when parallax requirements and the possibility of textureless surfaces are taken into account. We evaluate our approach extensively in diverse real-world and simulated environments, including ablation studies. To the best of the authors' knowledge, the proposed method is the first to achieve 3D monocular exploration in real-world unstructured outdoor environments. We open-source our implementation to support future research.

Paper Structure

This paper contains 10 sections, 3 equations, 4 figures.

Figures (4)

  • Figure 1: Illustration of the proposed approach. The mapping pipeline estimates depth using virtual (blue) and tracked (red) points from sparse monocular-inertial SLAM (left) running on board the UAV (top) to construct a large-scale map (right) consisting of free-space spheres (green), obstacle points (red) and frontier points (magenta).
  • Figure 2: Mapping pipeline overview. The OVDE and DBOF modules are illustrated in \ref{['fig:ofs']} and \ref{['fig:distbased']} respectively.
  • Figure 3: Open-area virtual-depth estimation diagram. The thick green points $\mathbf{x}_{vir, k}$ satisfy the condition defined in \ref{['sec:fake']} and are added to the construction of $\mathcal{F}_d$
  • Figure 4: Top-down illustration of sphere sampling and obstacle point management: The visible free-space polyhedron $\mathcal{P}_f^{t_2}$ is created using the low-covariance SLAM points tracked at $t_2$ (black crosses) and the camera's pose $\mathbf{p}_{cam}^{t_2}$. The size of a point $\mathbf{x}$ corresponds to the distance $d_{\mathbf{x}}$ that it was measured from. Since our method prioritizes closer measurements (of obstacles and lack thereof), only the (dashed red) point seen far away from the camera at a previous time $t_1$, which now lies in $\mathcal{P}_f^{t_2}$ very close to the camera, will be deleted as noise. New spheres are inscribed inside $\mathcal{P}_f^{t_2}$, but their radii are also bound by the protected points (solid red) $\mathcal{X}_p$, which were added to the map at $t_1$ and not deleted at $t_2$ even though they lie in $\mathcal{P}_f^{t_2}$. The (dashed black) two points observed far from the camera at $t_2$ are not added to the map, as there are more precisely localized (red) points nearby.