Table of Contents
Fetching ...

OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model

Junming Wang, Xiuxian Guan, Zekai Sun, Tianxiang Shen, Dong Huang, Fangming Liu, Heming Cui

TL;DR

OMEGA tackles occlusion-aware navigation for air-ground robots in dynamic, occlusion-heavy environments. It decouples semantic occupancy prediction from geometry using OccMamba, a BEV-fused, multi-branch network with Sem-Mamba and Geo-Mamba blocks, and pairs it with an ESDF-free AGR-Planner based on kinodynamic A* and gradient-based B-spline trajectory optimization. The system delivers real-time occlusion-aware local maps (22.1 FPS) and state-of-the-art mIoU (25.0%) on SemanticKITTI, while achieving high planning success in dynamic simulations (98%) and real-world tests (96%) with notable energy savings (~18%). These results demonstrate practical viability for fast, energy-efficient AGR navigation in visually occluded, rapidly changing environments.

Abstract

Air-ground robots (AGRs) are widely used in surveillance and disaster response due to their exceptional mobility and versatility (i.e., flying and driving). Current AGR navigation systems perform well in static occlusion-prone environments (e.g., indoors) by using 3D semantic occupancy networks to predict occlusions for complete local mapping and then computing Euclidean Signed Distance Field (ESDF) for path planning. However, these systems face challenges in dynamic, severe occlusion scenes (e.g., crowds) due to limitations in perception networks' low prediction accuracy and path planners' high computation overhead. In this paper, we propose OMEGA, which contains OccMamba with an Efficient AGR-Planner to address the above-mentioned problems. OccMamba adopts a novel architecture that separates semantic and occupancy prediction into independent branches, incorporating two mamba blocks within these branches. These blocks efficiently extract semantic and geometric features in 3D environments with linear complexity, ensuring that the network can learn long-distance dependencies to improve prediction accuracy. Semantic and geometric features are combined within the Bird's Eye View (BEV) space to minimise computational overhead during feature fusion. The resulting semantic occupancy map is then seamlessly integrated into the local map, providing occlusion awareness of the dynamic environment. Our AGR-Planner utilizes this local map and employs kinodynamic A* search and gradient-based trajectory optimization to guarantee planning is ESDF-free and energy-efficient. Extensive experiments demonstrate that OccMamba outperforms the state-of-the-art 3D semantic occupancy network with 25.0% mIoU. End-to-end navigation experiments in dynamic scenes verify OMEGA's efficiency, achieving a 96% average planning success rate. Code and video are available at https://jmwang0117.github.io/OMEGA/.

OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model

TL;DR

OMEGA tackles occlusion-aware navigation for air-ground robots in dynamic, occlusion-heavy environments. It decouples semantic occupancy prediction from geometry using OccMamba, a BEV-fused, multi-branch network with Sem-Mamba and Geo-Mamba blocks, and pairs it with an ESDF-free AGR-Planner based on kinodynamic A* and gradient-based B-spline trajectory optimization. The system delivers real-time occlusion-aware local maps (22.1 FPS) and state-of-the-art mIoU (25.0%) on SemanticKITTI, while achieving high planning success in dynamic simulations (98%) and real-world tests (96%) with notable energy savings (~18%). These results demonstrate practical viability for fast, energy-efficient AGR navigation in visually occluded, rapidly changing environments.

Abstract

Air-ground robots (AGRs) are widely used in surveillance and disaster response due to their exceptional mobility and versatility (i.e., flying and driving). Current AGR navigation systems perform well in static occlusion-prone environments (e.g., indoors) by using 3D semantic occupancy networks to predict occlusions for complete local mapping and then computing Euclidean Signed Distance Field (ESDF) for path planning. However, these systems face challenges in dynamic, severe occlusion scenes (e.g., crowds) due to limitations in perception networks' low prediction accuracy and path planners' high computation overhead. In this paper, we propose OMEGA, which contains OccMamba with an Efficient AGR-Planner to address the above-mentioned problems. OccMamba adopts a novel architecture that separates semantic and occupancy prediction into independent branches, incorporating two mamba blocks within these branches. These blocks efficiently extract semantic and geometric features in 3D environments with linear complexity, ensuring that the network can learn long-distance dependencies to improve prediction accuracy. Semantic and geometric features are combined within the Bird's Eye View (BEV) space to minimise computational overhead during feature fusion. The resulting semantic occupancy map is then seamlessly integrated into the local map, providing occlusion awareness of the dynamic environment. Our AGR-Planner utilizes this local map and employs kinodynamic A* search and gradient-based trajectory optimization to guarantee planning is ESDF-free and energy-efficient. Extensive experiments demonstrate that OccMamba outperforms the state-of-the-art 3D semantic occupancy network with 25.0% mIoU. End-to-end navigation experiments in dynamic scenes verify OMEGA's efficiency, achieving a 96% average planning success rate. Code and video are available at https://jmwang0117.github.io/OMEGA/.
Paper Structure (17 sections, 15 equations, 7 figures, 5 tables)

This paper contains 17 sections, 15 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Omega is the first AGR-specific navigation system that enables occlusion-aware mapping and pathfinding in dynamic scenarios. It integrates OccMamba for real-time obstacle prediction from point clouds and updates local maps accordingly, while AGR-Planner rapidly generates reliable paths using the updated local map.
  • Figure 2: Omega system architecture. The perception network (i.e., OccMamba) and AGR-planner run asynchronously on the onboard computer, connected through a query-based map update method from wang2024agrnav to ensure real-time local map updates with prediction results.
  • Figure 3: The overview of the proposed OccMamba. It consists of semantic, geometry and BEV fusion branches. Meanwhile, lightweight MLPs serve as auxiliary heads during training, attached after each encoder block in the semantic and completion branches for voxel predictions. At the inference stage, these heads are detached to preserve a streamlined network architecture.
  • Figure 4: AGR-Planner and topological trajectory generation.
  • Figure 5: Results of a qualitative comparison on the SemanticKITTI validation set are presented, showcasing various models.
  • ...and 2 more figures