Table of Contents
Fetching ...

AeroDGS: Physically Consistent Dynamic Gaussian Splatting for Single-Sequence Aerial 4D Reconstruction

Hanyang Liu, Rongjun Qin

TL;DR

AeroDGS introduces a Monocular Geometry Lifting module that reconstructs reliable static and dynamic geometry from a single aerial sequence, providing a robust basis for dynamic estimation and a Physics-Guided Optimization module that incorporates differentiable ground-support, upright-stability, and trajectory-smoothness priors, transforming ambiguous image cues into physically consistent motion.

Abstract

Recent advances in 4D scene reconstruction have significantly improved dynamic modeling across various domains. However, existing approaches remain limited under aerial conditions with single-view capture, wide spatial range, and dynamic objects of limited spatial footprint and large motion disparity. These challenges cause severe depth ambiguity and unstable motion estimation, making monocular aerial reconstruction inherently ill-posed. To this end, we present AeroDGS, a physics-guided 4D Gaussian splatting framework for monocular UAV videos. AeroDGS introduces a Monocular Geometry Lifting module that reconstructs reliable static and dynamic geometry from a single aerial sequence, providing a robust basis for dynamic estimation. To further resolve monocular ambiguity, we propose a Physics-Guided Optimization module that incorporates differentiable ground-support, upright-stability, and trajectory-smoothness priors, transforming ambiguous image cues into physically consistent motion. The framework jointly refines static backgrounds and dynamic entities with stable geometry and coherent temporal evolution. We additionally build a real-world UAV dataset that spans various altitudes and motion conditions to evaluate dynamic aerial reconstruction. Experiments on synthetic and real UAV scenes demonstrate that AeroDGS outperforms state-of-the-art methods, achieving superior reconstruction fidelity in dynamic aerial environments.

AeroDGS: Physically Consistent Dynamic Gaussian Splatting for Single-Sequence Aerial 4D Reconstruction

TL;DR

AeroDGS introduces a Monocular Geometry Lifting module that reconstructs reliable static and dynamic geometry from a single aerial sequence, providing a robust basis for dynamic estimation and a Physics-Guided Optimization module that incorporates differentiable ground-support, upright-stability, and trajectory-smoothness priors, transforming ambiguous image cues into physically consistent motion.

Abstract

Recent advances in 4D scene reconstruction have significantly improved dynamic modeling across various domains. However, existing approaches remain limited under aerial conditions with single-view capture, wide spatial range, and dynamic objects of limited spatial footprint and large motion disparity. These challenges cause severe depth ambiguity and unstable motion estimation, making monocular aerial reconstruction inherently ill-posed. To this end, we present AeroDGS, a physics-guided 4D Gaussian splatting framework for monocular UAV videos. AeroDGS introduces a Monocular Geometry Lifting module that reconstructs reliable static and dynamic geometry from a single aerial sequence, providing a robust basis for dynamic estimation. To further resolve monocular ambiguity, we propose a Physics-Guided Optimization module that incorporates differentiable ground-support, upright-stability, and trajectory-smoothness priors, transforming ambiguous image cues into physically consistent motion. The framework jointly refines static backgrounds and dynamic entities with stable geometry and coherent temporal evolution. We additionally build a real-world UAV dataset that spans various altitudes and motion conditions to evaluate dynamic aerial reconstruction. Experiments on synthetic and real UAV scenes demonstrate that AeroDGS outperforms state-of-the-art methods, achieving superior reconstruction fidelity in dynamic aerial environments.
Paper Structure (34 sections, 11 equations, 4 figures, 3 tables)

This paper contains 34 sections, 11 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Summary. Given (a) a monocular aerial video of dynamic urban scenes, AeroDGS reconstructs a physically consistent 4D model by jointly integrating static structures and dynamic motion with Gaussian representation. The framework (b) performs photorealistic novel-view synthesis with temporally coherent geometry and (c) achieves higher reconstruction fidelity compared to state-of-the-art methods. Please use Adobe Reader / PDF-XChange Editor to see animations.
  • Figure 2: Overview of the proposed AeroDGS. Given a monocular aerial sequence, AeroDGS introduces a Monocular Geometry Lifting module to reconstruct scene geometry and separate dynamic foreground from static background. The recovered seeds are composed and jointly optimized in a unified Gaussian representation. A Physics-Guided Optimization module is proposed to resolve pose ambiguity of dynamic objects under monocular settings, ensuring physically consistent 4D reconstruction.
  • Figure 3: Physics-Guided Optimization. (a) In monocular UAV scenes, dynamic objects exhibit uncertain 3D positions and orientations due to single-view geometry and small image footprints. AeroDGS introduces differentiable physics-guided constraints that enforce (b) ground support, maintaining consistent contact with the local plane; (c) upright stability, aligning the vertical axis with the reference direction; and (d) trajectory smoothness, ensuring continuous acceleration and temporally coherent motion. (e) These constraints transform under-determined poses into a single real-world-consistent configuration, yielding accurate motion recovery and stable optimization.
  • Figure 4: Qualitative comparison of novel-view synthesis results. Our method achieves high overall reconstruction quality on both synthetic and real-world UAV datasets, maintaining high fidelity under diverse altitudes, illumination, and object motion patterns. Sharper structures and more consistent appearance are preserved compared with state-of-the-art methods. Yellow and red rectangular boxes highlight enlarged views of corresponding areas for visual comparison.