UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting
Jaehoon Choi, Dongki Jung, Yonghan Lee, Sungmin Eum, Dinesh Manocha, Heesung Kwon
TL;DR
UAVTwin advances UAV perception by building photorealistic digital twins from real-world UAV footage using Multi-sequence Gaussian Splatting (MsGS) to reconstruct backgrounds with large appearance variation and by instrumenting mask refinement to handle dynamic objects. Foreground humans are inserted via Blender with synthetic trajectories and motion from AMASS/SynBody, enabling realistic scene composition and rich ground-truth annotations, while a two-stage training strategy aligns Gaussians with geometry and improves novel-view rendering. Quantitative results show improved neural rendering fidelity and meaningful detection gains (mAP) when augmented data are used to train UAV perception models, though there remains a domain gap between synthetic humans and real data. The framework offers a practical pathway for generating diverse, labeled UAV data to boost perception tasks, with future work aimed at reducing the remaining domain gap through more realistic avatars and insertion techniques.
Abstract
We present UAVTwin, a method for creating digital twins from real-world environments and facilitating data augmentation for training downstream models embedded in unmanned aerial vehicles (UAVs). Specifically, our approach focuses on synthesizing foreground components, such as various human instances in motion within complex scene backgrounds, from UAV perspectives. This is achieved by integrating 3D Gaussian Splatting (3DGS) for reconstructing backgrounds along with controllable synthetic human models that display diverse appearances and actions in multiple poses. To the best of our knowledge, UAVTwin is the first approach for UAV-based perception that is capable of generating high-fidelity digital twins based on 3DGS. The proposed work significantly enhances downstream models through data augmentation for real-world environments with multiple dynamic objects and significant appearance variations-both of which typically introduce artifacts in 3DGS-based modeling. To tackle these challenges, we propose a novel appearance modeling strategy and a mask refinement module to enhance the training of 3D Gaussian Splatting. We demonstrate the high quality of neural rendering by achieving a 1.23 dB improvement in PSNR compared to recent methods. Furthermore, we validate the effectiveness of data augmentation by showing a 2.5% to 13.7% improvement in mAP for the human detection task.
