Seeing Where to Deploy: Metric RGB-Based Traversability Analysis for Aerial-to-Ground Hidden Space Inspection

Seoyoung Lee; Shaekh Mohammad Shithil; Durgakant Pushp; Lantao Liu; Zhangyang Wang

Seeing Where to Deploy: Metric RGB-Based Traversability Analysis for Aerial-to-Ground Hidden Space Inspection

Seoyoung Lee, Shaekh Mohammad Shithil, Durgakant Pushp, Lantao Liu, Zhangyang Wang

Abstract

Inspection of confined infrastructure such as culverts often requires accessing hidden spaces whose entrances are reachable primarily from elevated viewpoints. Aerial-ground cooperation enables a UAV to deploy a compact UGV for interior exploration, but selecting a suitable deployment region from aerial observations requires metric terrain reasoning involving scale ambiguity, reconstruction uncertainty, and terrain semantics. We present a metric RGB-based geometric-semantic reconstruction and traversability analysis framework for aerial-to-ground hidden space inspection. A feed-forward multi-view RGB reconstruction backbone produces dense geometry, while temporally consistent semantic segmentation yields a 3D semantic map. To enable deployment-relevant measurements without LiDAR-based dense mapping, we introduce an embodied motion prior that recovers metric scale by enforcing consistency between predicted camera motion and onboard platform egomotion. From the metrically grounded reconstruction, we construct a confidence-aware geometric-semantic traversability map and evaluate candidate deployment zones under explicit reachability constraints. Experiments on a tethered UAV-UGV platform demonstrate reliable deployment-zone identification in hidden space scenarios.

Seeing Where to Deploy: Metric RGB-Based Traversability Analysis for Aerial-to-Ground Hidden Space Inspection

Abstract

Paper Structure (21 sections, 20 equations, 5 figures, 3 tables)

This paper contains 21 sections, 20 equations, 5 figures, 3 tables.

Introduction
Related Work
Research Background and Preliminaries
Visual Geometry Grounded Transformer (VGGT)
3D Semantic Segmentation Mapping
Proposed Framework
Semantic and Geometric Reconstruction
RGB-based 3D Geometric-Semantic Reconstruction
Embodied Metric-Scale Geometric Reconstruction
Traversability Mapping from Aerial View
Semantic Information
Geometric Information
Geometric--Semantic Fusion Traversability Estimation
Deployment Zone Selection
Experiments and Results
...and 6 more sections

Figures (5)

Figure 1: Overview of the proposed RGB-based geometric and semantic reconstruction pipeline with embodied metric grounding. Given a top-down RGB video captured by a UAV, VGGT performs feed-forward multi-view geometric reconstruction to predict camera parameters and depth, from which dense 3D geometry is obtained via depth unprojection. In parallel, an adaptive segmentation module generates temporally consistent instance masks that are lifted into 3D to produce a geometry-aligned semantic reconstruction. To resolve the inherent scale ambiguity of monocular reconstruction, relative platform egomotion obtained from onboard state estimation is used to recover metric scale through motion-consistent alignment. The resulting metrically grounded geometric-semantic map enables extraction of deployment-relevant targets, including representative 3D centroids (via medoid estimation) and associated per-class confidence scores to be used for downstream traversability and deployment analysis and planning.
Figure 2: Experimental Setup: An Integrated aerial–ground robotic system. The DJI Matrice M600 hexacopter carries a compact tracked ground robot and integrates a Livox Mid-360 LiDAR, RealSense camera, and Jetson Xavier NX compute module. Dense LiDAR data are not used in the proposed method.
Figure 3: RGB-based geometric and semantic 3D reconstruction from purely top-down aerial views at increasing UAV elevations for two hidden-space inspection scenes: a culvert entrance (left) and a vent system occluded by surrounding structures (right). For each elevation, RGB reconstructions are shown above their corresponding semantic reconstructions for both top-down and ground-level views. Ground-level views also include the camera trajectory. Despite limited viewpoint parallax for overhead observations and higher altitudes, our method maintains consistent ground-level geometry and accurately segments deployment-relevant targets across a wide range of flight altitudes using RGB input only. Semantic mask granularity can be adjusted to capture structures at different spatial scales as required.
Figure 4: Visual Results of Traversability mapping. From left to right: RGB-based 3D reconstruction, semantic segmentation, geometric traversability map, and geometric–semantic fusion result. The geometric map assigns high scores to planar regions but partially misclassifies rocky areas due to smoothed geometry. The fusion map assigns low traversability to terrain classes such as rocks and structural obstacles, producing a more conservative and deployment-consistent representation.
Figure 5: Deployment location selection and trajectory generation. High traversability regions (green) indicate safe deployment areas, while red regions denote unsafe terrain. Candidate deployment locations (blue) are identified and feasible trajectories are generated from the UAV to the deployment zone.

Seeing Where to Deploy: Metric RGB-Based Traversability Analysis for Aerial-to-Ground Hidden Space Inspection

Abstract

Seeing Where to Deploy: Metric RGB-Based Traversability Analysis for Aerial-to-Ground Hidden Space Inspection

Authors

Abstract

Table of Contents

Figures (5)