Table of Contents
Fetching ...

DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair

Weihang Li, Weirong Chen, Shenhan Qian, Jiajie Chen, Daniel Cremers, Haoang Li

TL;DR

Dynamic Gaussian Splatting from An Unposed Image Pair (DynSUP) enables high-fidelity novel-view synthesis of dynamic scenes from two unposed views. It jointly learns an object-level dense bundle adjustment to recover camera pose and per-object motion, and a differentiable SE(3) field driven Gaussian rendering where each Gaussian undergoes its own SE(3) transformation for fine-grained motion modeling, with test-time pose and per-object ratio alignment. Experiments on KITTI and Kubric show consistent improvements over static-scene and pose-dependent baselines, demonstrating robust dynamic reconstruction under sparse, unposed inputs. This approach broadens the applicability of Gaussian-based neural rendering to challenging dynamic environments with minimal input.

Abstract

Recent advances in 3D Gaussian Splatting have shown promising results. Existing methods typically assume static scenes and/or multiple images with prior poses. Dynamics, sparse views, and unknown poses significantly increase the problem complexity due to insufficient geometric constraints. To overcome this challenge, we propose a method that can use only two images without prior poses to fit Gaussians in dynamic environments. To achieve this, we introduce two technical contributions. First, we propose an object-level two-view bundle adjustment. This strategy decomposes dynamic scenes into piece-wise rigid components, and jointly estimates the camera pose and motions of dynamic objects. Second, we design an SE(3) field-driven Gaussian training method. It enables fine-grained motion modeling through learnable per-Gaussian transformations. Our method leads to high-fidelity novel view synthesis of dynamic scenes while accurately preserving temporal consistency and object motion. Experiments on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-art approaches designed for the cases of static environments, multiple images, and/or known poses. Our project page is available at https://colin-de.github.io/DynSUP/.

DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair

TL;DR

Dynamic Gaussian Splatting from An Unposed Image Pair (DynSUP) enables high-fidelity novel-view synthesis of dynamic scenes from two unposed views. It jointly learns an object-level dense bundle adjustment to recover camera pose and per-object motion, and a differentiable SE(3) field driven Gaussian rendering where each Gaussian undergoes its own SE(3) transformation for fine-grained motion modeling, with test-time pose and per-object ratio alignment. Experiments on KITTI and Kubric show consistent improvements over static-scene and pose-dependent baselines, demonstrating robust dynamic reconstruction under sparse, unposed inputs. This approach broadens the applicability of Gaussian-based neural rendering to challenging dynamic environments with minimal input.

Abstract

Recent advances in 3D Gaussian Splatting have shown promising results. Existing methods typically assume static scenes and/or multiple images with prior poses. Dynamics, sparse views, and unknown poses significantly increase the problem complexity due to insufficient geometric constraints. To overcome this challenge, we propose a method that can use only two images without prior poses to fit Gaussians in dynamic environments. To achieve this, we introduce two technical contributions. First, we propose an object-level two-view bundle adjustment. This strategy decomposes dynamic scenes into piece-wise rigid components, and jointly estimates the camera pose and motions of dynamic objects. Second, we design an SE(3) field-driven Gaussian training method. It enables fine-grained motion modeling through learnable per-Gaussian transformations. Our method leads to high-fidelity novel view synthesis of dynamic scenes while accurately preserving temporal consistency and object motion. Experiments on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-art approaches designed for the cases of static environments, multiple images, and/or known poses. Our project page is available at https://colin-de.github.io/DynSUP/.

Paper Structure

This paper contains 13 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Dynamic Gaussian Splatting from An Unposed Image Pair. Given two images captured at distinct moments with unknown poses in a dynamic environment, our method can fit dynamic Gaussian splatting and then synthesize a new image from a novel viewpoint at a different time.
  • Figure 2: Overview of our DynSUP framework. Given two unposed images, we first perform Object-level Dense Bundle Adjustment to estimate initial camera poses and object motions by decomposing the scene into piece-wise rigid components. The dense 3D Gaussian primitives are initialized with per-object $SE(3)$ transformations. In the $SE(3)$ Field-driven 3DGS stage, we jointly optimize the camera poses, per-Gaussian $SE(3)$ transformations, and Gaussian parameters to reconstruct the dynamic scene. The optimized $SE(3)$ field captures fine-grained motion details while maintaining temporal consistency. Finally, the dynamic scene is rendered using the optimized camera poses and $SE(3)$ field to generate high-quality novel-view synthesis results.
  • Figure 3: Qualitative comparison on the Kubric dataset greff2021kubric. Our method produces high-fidelity results for challenging scenes with multiple fast-moving objects.
  • Figure 4: Qualitative comparison on the KITTI dataset Geiger2012CVPR. Our method handles complex urban environments with varying object and camera motion better than baseline approaches.
  • Figure 5: Ablation study on the Kubric dataset greff2021kubric for $SE(3)$ initialization.