Table of Contents
Fetching ...

Zero-Shot UAV Navigation in Forests via Relightable 3D Gaussian Splatting

Zinan Lv, Yeqian Qian, Chen Sang, Hao Liu, Danping Zou, Ming Yang

TL;DR

This work tackles high-speed monocular UAV navigation in unstructured outdoor environments by bridging the visual sim-to-real gap with a photorealistic simulator built on Relightable 3D Gaussian Splatting. By explicitly decoupling geometry, albedo, and environmental lighting, the authors enable controllable illumination synthesis and efficient, physically grounded relighting, which, when combined with an end-to-end RL policy, supports zero-shot transfer to real forests. A two-stage training curriculum—first on static lighting and then with diverse lighting variations—yields robust, illumination-invariant navigation that achieves up to $10\,\mathrm{m/s}$ in real-world trials. The approach demonstrates significant potential for lightweight, monocular UAV autonomy in cluttered outdoor settings, with practical implications for search, inspection, and disaster response. The Relightable 3DGS framework balances physical plausibility with rendering speed, enabling large-scale photometric domain adaptation for real-world robotics.

Abstract

UAV navigation in unstructured outdoor environments using passive monocular vision is hindered by the substantial visual domain gap between simulation and reality. While 3D Gaussian Splatting enables photorealistic scene reconstruction from real-world data, existing methods inherently couple static lighting with geometry, severely limiting policy generalization to dynamic real-world illumination. In this paper, we propose a novel end-to-end reinforcement learning framework designed for effective zero-shot transfer to unstructured outdoors. Within a high-fidelity simulation grounded in real-world data, our policy is trained to map raw monocular RGB observations directly to continuous control commands. To overcome photometric limitations, we introduce Relightable 3D Gaussian Splatting, which decomposes scene components to enable explicit, physically grounded editing of environmental lighting within the neural representation. By augmenting training with diverse synthesized lighting conditions ranging from strong directional sunlight to diffuse overcast skies, we compel the policy to learn robust, illumination-invariant visual features. Extensive real-world experiments demonstrate that a lightweight quadrotor achieves robust, collision-free navigation in complex forest environments at speeds up to 10 m/s, exhibiting significant resilience to drastic lighting variations without fine-tuning.

Zero-Shot UAV Navigation in Forests via Relightable 3D Gaussian Splatting

TL;DR

This work tackles high-speed monocular UAV navigation in unstructured outdoor environments by bridging the visual sim-to-real gap with a photorealistic simulator built on Relightable 3D Gaussian Splatting. By explicitly decoupling geometry, albedo, and environmental lighting, the authors enable controllable illumination synthesis and efficient, physically grounded relighting, which, when combined with an end-to-end RL policy, supports zero-shot transfer to real forests. A two-stage training curriculum—first on static lighting and then with diverse lighting variations—yields robust, illumination-invariant navigation that achieves up to in real-world trials. The approach demonstrates significant potential for lightweight, monocular UAV autonomy in cluttered outdoor settings, with practical implications for search, inspection, and disaster response. The Relightable 3DGS framework balances physical plausibility with rendering speed, enabling large-scale photometric domain adaptation for real-world robotics.

Abstract

UAV navigation in unstructured outdoor environments using passive monocular vision is hindered by the substantial visual domain gap between simulation and reality. While 3D Gaussian Splatting enables photorealistic scene reconstruction from real-world data, existing methods inherently couple static lighting with geometry, severely limiting policy generalization to dynamic real-world illumination. In this paper, we propose a novel end-to-end reinforcement learning framework designed for effective zero-shot transfer to unstructured outdoors. Within a high-fidelity simulation grounded in real-world data, our policy is trained to map raw monocular RGB observations directly to continuous control commands. To overcome photometric limitations, we introduce Relightable 3D Gaussian Splatting, which decomposes scene components to enable explicit, physically grounded editing of environmental lighting within the neural representation. By augmenting training with diverse synthesized lighting conditions ranging from strong directional sunlight to diffuse overcast skies, we compel the policy to learn robust, illumination-invariant visual features. Extensive real-world experiments demonstrate that a lightweight quadrotor achieves robust, collision-free navigation in complex forest environments at speeds up to 10 m/s, exhibiting significant resilience to drastic lighting variations without fine-tuning.
Paper Structure (20 sections, 12 equations, 7 figures, 4 tables)

This paper contains 20 sections, 12 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The pipeline of our proposed framework for monocular RGB vision-based autonomous UAV navigation comprises three key stages: 1) Photorealistic Environment Construction: Real-world unstructured scenes are captured and reconstructed using 3D Gaussian Splatting to build a high-fidelity simulator. 2) Sim-to-Real Adaptation: Domain adaptation techniques, including action noise injection, latency simulation, camera pose perturbation, and Relightable 3D Gaussian Splatting, are employed to bridge the visual and dynamics gaps. 3) End-to-end Vision-Based Policy Learning: A reinforcement learning policy processes monocular RGB images and drone state information through CNN and MLP encoders with GRU-based temporal modeling, generating control commands via actor-critic network heads.
  • Figure 2: Visual illustration of the adaptive speed schedule defined in Eq. \ref{['eq:speed']}. The target forward speed $v_{tar}$ is maximized at $v_{base}$ during straight flight and is smoothly attenuated towards $v_{\min}$ as the yaw rate $|u|$ approaches the limit $u_{\max}$, preventing sideslip during sharp turns.
  • Figure 3: Examples of photorealistic Relightable 3D Gaussian Splatting. The columns display the original natural light (a) and synthesized variations: overcast (b), cool-toned dusk (c), and warm-toned morning sunlight (d) across different outdoor scenes.
  • Figure 4: Simulation training performance evolution across two stages. The top panel illustrates the mean reward, while the bottom panel displays the navigation success rate over simulation steps. The vertical gray dashed line marks the curriculum transition from Stage 1 (Baseline training, blue curves) to Stage 2 (training with Domain Adaptation, red curves) at approximately 1.6M steps.
  • Figure 5: Real-world flight trajectories across multiple unstructured forest environments. Each subplot shows a successful navigation trial from a distinct location, with the drone's path overlaid in color. The trajectories demonstrate the policy's ability to generalize to various cluttered scenes and execute collision-free navigation.
  • ...and 2 more figures