Flying in Clutter on Monocular RGB by Learning in 3D Radiance Fields with Domain Adaptation
Xijie Huang, Jinhan Li, Tianyue Wu, Xin Zhou, Zhichao Han, Fei Gao
TL;DR
This work tackles autonomous UAV navigation in clutter using only monocular RGB input by learning policies in photorealistic 3D Gaussian Splatting (3DGS) environments and bridging the sim-to-real gap with adversarial domain adaptation and domain randomization. It introduces an end-to-end RGB-based RL framework with an actor-critic architecture and a depth-privileged critic, paired with accelerated 3DGS rendering via pruning. The method demonstrates zero-shot transfer to real-world flights under varying obstacle layouts and illumination, supported by ablations and latent-space analyses that clarify the roles of DA and DR in reducing domain shift. The results indicate a practical pathway for monocular RGB navigation on lightweight UAVs and point toward scaling 3DGS-based training to diverse, large-scale datasets and ecosystem-level deployment.
Abstract
Modern autonomous navigation systems predominantly rely on lidar and depth cameras. However, a fundamental question remains: Can flying robots navigate in clutter using solely monocular RGB images? Given the prohibitive costs of real-world data collection, learning policies in simulation offers a promising path. Yet, deploying such policies directly in the physical world is hindered by the significant sim-to-real perception gap. Thus, we propose a framework that couples the photorealism of 3D Gaussian Splatting (3DGS) environments with Adversarial Domain Adaptation. By training in high-fidelity simulation while explicitly minimizing feature discrepancy, our method ensures the policy relies on domain-invariant cues. Experimental results demonstrate that our policy achieves robust zero-shot transfer to the physical world, enabling safe and agile flight in unstructured environments with varying illumination.
