SIGN: Safety-Aware Image-Goal Navigation for Autonomous Drones via Reinforcement Learning
Zichen Yan, Rui Huang, Lei He, Shao Guo, Lin Zhao
TL;DR
The work tackles image-goal navigation for autonomous drones under map-less conditions, a challenging setting due to high-frequency control and potential pose-estimation drift. It presents SIGN, a sim-to-real, end-to-end RL framework that outputs continuous velocity commands and integrates a depth-based safety module with action correction to ensure obstacle avoidance. The approach leverages self-supervised auxiliary tasks (Future Prediction and RandomShift) to improve representation learning and sample efficiency, achieving state-of-the-art performance on Gibson in continuous control and strong cross-domain generalization to MP3D and HM3D. Real-world experiments confirm sim-to-real transfer, safety reliability, and practical feasibility for onboard deployment, with a robust planning loop and manageable resource usage.
Abstract
Image-goal navigation (ImageNav) tasks a robot with autonomously exploring an unknown environment and reaching a location that visually matches a given target image. While prior works primarily study ImageNav for ground robots, enabling this capability for autonomous drones is substantially more challenging due to their need for high-frequency feedback control and global localization for stable flight. In this paper, we propose a novel sim-to-real framework that leverages reinforcement learning (RL) to achieve ImageNav for drones. To enhance visual representation ability, our approach trains the vision backbone with auxiliary tasks, including image perturbations and future transition prediction, which results in more effective policy training. The proposed algorithm enables end-to-end ImageNav with direct velocity control, eliminating the need for external localization. Furthermore, we integrate a depth-based safety module for real-time obstacle avoidance, allowing the drone to safely navigate in cluttered environments. Unlike most existing drone navigation methods that focus solely on reference tracking or obstacle avoidance, our framework supports comprehensive navigation behaviors, including autonomous exploration, obstacle avoidance, and image-goal seeking, without requiring explicit global mapping. Code and model checkpoints are available at https://github.com/Zichen-Yan/SIGN.
