Table of Contents
Fetching ...

HALO: High-Altitude Language-Conditioned Monocular Aerial Exploration and Navigation

Yuezhan Tao, Dexter Ong, Fernando Cladera, Jason Hughes, Camillo J. Taylor, Pratik Chaudhari, Vijay Kumar

TL;DR

HALO tackles the challenge of high-altitude, monocular mapping and language-driven exploration by integrating a real-time feed-forward 3D reconstruction backbone with dense language-embedded semantics and a hierarchical planner. The approach maintains open-set metric-semantic maps on-board, fusing GPS priors for scale, performing loop closures, and using frontier-based global planning with a local ATSP-driven exploration strategy. Simulation and real-world experiments show HALO achieves faster task completion and improved competitive ratios compared to baselines, including significant gains in semantic-driven navigation at altitudes around 40 m. Overall, HALO enables flexible, task-driven autonomous missions in large outdoor environments using a lightweight, onboard monocular system.

Abstract

We demonstrate real-time high-altitude aerial metric-semantic mapping and exploration using a monocular camera paired with a global positioning system (GPS) and an inertial measurement unit (IMU). Our system, named HALO, addresses two key challenges: (i) real-time dense 3D reconstruction using vision at large distances, and (ii) mapping and exploration of large-scale outdoor environments with accurate scene geometry and semantics. We demonstrate that HALO can plan informative paths that exploit this information to complete missions with multiple tasks specified in natural language. In simulation-based evaluation across large-scale environments of size up to 78,000 sq. m., HALO consistently completes tasks with less exploration time and achieves up to 68% higher competitive ratio in terms of the distance traveled compared to the state-of-the-art semantic exploration baseline. We use real-world experiments on a custom quadrotor platform to demonstrate that (i) all modules can run onboard the robot, and that (ii) in diverse environments HALO can support effective autonomous execution of missions covering up to 24,600 sq. m. area at an altitude of 40 m. Experiment videos and more details can be found on our project page: https://tyuezhan.github.io/halo/.

HALO: High-Altitude Language-Conditioned Monocular Aerial Exploration and Navigation

TL;DR

HALO tackles the challenge of high-altitude, monocular mapping and language-driven exploration by integrating a real-time feed-forward 3D reconstruction backbone with dense language-embedded semantics and a hierarchical planner. The approach maintains open-set metric-semantic maps on-board, fusing GPS priors for scale, performing loop closures, and using frontier-based global planning with a local ATSP-driven exploration strategy. Simulation and real-world experiments show HALO achieves faster task completion and improved competitive ratios compared to baselines, including significant gains in semantic-driven navigation at altitudes around 40 m. Overall, HALO enables flexible, task-driven autonomous missions in large outdoor environments using a lightweight, onboard monocular system.

Abstract

We demonstrate real-time high-altitude aerial metric-semantic mapping and exploration using a monocular camera paired with a global positioning system (GPS) and an inertial measurement unit (IMU). Our system, named HALO, addresses two key challenges: (i) real-time dense 3D reconstruction using vision at large distances, and (ii) mapping and exploration of large-scale outdoor environments with accurate scene geometry and semantics. We demonstrate that HALO can plan informative paths that exploit this information to complete missions with multiple tasks specified in natural language. In simulation-based evaluation across large-scale environments of size up to 78,000 sq. m., HALO consistently completes tasks with less exploration time and achieves up to 68% higher competitive ratio in terms of the distance traveled compared to the state-of-the-art semantic exploration baseline. We use real-world experiments on a custom quadrotor platform to demonstrate that (i) all modules can run onboard the robot, and that (ii) in diverse environments HALO can support effective autonomous execution of missions covering up to 24,600 sq. m. area at an altitude of 40 m. Experiment videos and more details can be found on our project page: https://tyuezhan.github.io/halo/.

Paper Structure

This paper contains 14 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Conceptual overview of HALO: A high-altitude aerial platform (a) captures monocular aerial imagery (b) as it navigates the environment. HALO constructs a metric-semantic map in real-time (c). Upon this map, a hierarchical planner detects semantically relevant frontiers to make informative plans that balance exploration and exploitation to complete tasks specified in natural language. The semantic map contains dense language embeddings that provide the flexibility to adapt to new tasks with a wide range of semantics in real time. Three tasks are presented in the map above in orange, cyan, and magenta.
  • Figure 2: System overview of HALO: The robot takes in monocular RGB images and GPS measurements. The monocular dense geometric mapping module estimates depth camera poses from a sequence of images. The metric-semantic mapping module maintains an occupancy grid and a dense semantic feature grid. Frontiers and task relevancy are extracted from the metric-semantic map incrementally. Leveraging both geometric and semantic information, the hierarchical planner balances exploration and exploitation to plan paths for efficient task completion.
  • Figure 3: Monocular dense aerial mapping using feed-forward 3D reconstruction models (F3DR). GPS readings are used to provide position priors G (magenta). The pose graph also maintains ICP (cyan) and F3DR (blue) relative pose estimates as constraints between submaps. Loop closures are implemented with relative pose estimates from F3DR (orange).
  • Figure 4: Visualization of 3D reconstructions of various methods on PolyCity and Forest in Unity.
  • Figure 5: Qualitative results from real-world experiments. The safe and reachable waypoints and experiment areas are shown in (a). The onboard reconstructed maps are shown in (b), with executed paths in red.
  • ...and 1 more figures