Zero-Shot UAV Navigation in Forests via Relightable 3D Gaussian Splatting
Zinan Lv, Yeqian Qian, Chen Sang, Hao Liu, Danping Zou, Ming Yang
TL;DR
This work tackles high-speed monocular UAV navigation in unstructured outdoor environments by bridging the visual sim-to-real gap with a photorealistic simulator built on Relightable 3D Gaussian Splatting. By explicitly decoupling geometry, albedo, and environmental lighting, the authors enable controllable illumination synthesis and efficient, physically grounded relighting, which, when combined with an end-to-end RL policy, supports zero-shot transfer to real forests. A two-stage training curriculum—first on static lighting and then with diverse lighting variations—yields robust, illumination-invariant navigation that achieves up to $10\,\mathrm{m/s}$ in real-world trials. The approach demonstrates significant potential for lightweight, monocular UAV autonomy in cluttered outdoor settings, with practical implications for search, inspection, and disaster response. The Relightable 3DGS framework balances physical plausibility with rendering speed, enabling large-scale photometric domain adaptation for real-world robotics.
Abstract
UAV navigation in unstructured outdoor environments using passive monocular vision is hindered by the substantial visual domain gap between simulation and reality. While 3D Gaussian Splatting enables photorealistic scene reconstruction from real-world data, existing methods inherently couple static lighting with geometry, severely limiting policy generalization to dynamic real-world illumination. In this paper, we propose a novel end-to-end reinforcement learning framework designed for effective zero-shot transfer to unstructured outdoors. Within a high-fidelity simulation grounded in real-world data, our policy is trained to map raw monocular RGB observations directly to continuous control commands. To overcome photometric limitations, we introduce Relightable 3D Gaussian Splatting, which decomposes scene components to enable explicit, physically grounded editing of environmental lighting within the neural representation. By augmenting training with diverse synthesized lighting conditions ranging from strong directional sunlight to diffuse overcast skies, we compel the policy to learn robust, illumination-invariant visual features. Extensive real-world experiments demonstrate that a lightweight quadrotor achieves robust, collision-free navigation in complex forest environments at speeds up to 10 m/s, exhibiting significant resilience to drastic lighting variations without fine-tuning.
