MorphoNavi: Aerial-Ground Robot Navigation with Object Oriented Mapping in Digital Twin
Sausar Karaf, Mikhail Martynov, Oleg Sautenkov, Zhanibek Darush, Dzmitry Tsetserukou
TL;DR
This work tackles open-world navigation for aerial-ground robots using a single monocular camera, removing the need for depth sensors and extensive retraining. It combines monocular depth estimation with a geometric distance model and depth refinements from Depth Anything and Segment Anything to produce semantically rich maps that support high-level planning, visualized through a Unity-based digital twin. In simulated search-and-rescue experiments, the system achieved a 97.4% object-detection rate, a 13.6 cm mean position error, and roughly 7.34 s per image, demonstrating feasibility for cluttered environments with modest computation. The results suggest significant practical impact by reducing hardware and bandwidth requirements while enabling richer scene understanding and integration with vision-language modules for improved decision-making in autonomous aerial-ground navigation.
Abstract
This paper presents a novel mapping approach for a universal aerial-ground robotic system utilizing a single monocular camera. The proposed system is capable of detecting a diverse range of objects and estimating their positions without requiring fine-tuning for specific environments. The system's performance was evaluated through a simulated search-and-rescue scenario, where the MorphoGear robot successfully located a robotic dog while an operator monitored the process. This work contributes to the development of intelligent, multimodal robotic systems capable of operating in unstructured environments.
