Table of Contents
Fetching ...

High-Speed Dynamic 3D Imaging with Sensor Fusion Splatting

Zihao Zou, Ziyuan Qu, Xi Peng, Vivek Boominathan, Adithya Pediredla, Praneeth Chakravarthula

TL;DR

The paper tackles the challenge of capturing high-speed deforming 3D scenes beyond the capabilities of any single imaging modality. It proposes Sensor Fusion Splatting, a deformable 3D Gaussian framework that jointly fuses RGB, event, and depth data, guided by a temporal deformation model to produce time-resolved, high-fidelity reconstructions. The authors demonstrate superior rendering fidelity and structural accuracy on synthetic and real-world datasets, supported by a hardware prototype and robust performance under challenging conditions such as low light and small baselines. This approach enables efficient, multi-sensor high-speed 3D imaging with potential for robust novel-view synthesis and reduced data demands compared to traditional ultra-high-speed imaging systems.

Abstract

Capturing and reconstructing high-speed dynamic 3D scenes has numerous applications in computer graphics, vision, and interdisciplinary fields such as robotics, aerodynamics, and evolutionary biology. However, achieving this using a single imaging modality remains challenging. For instance, traditional RGB cameras suffer from low frame rates, limited exposure times, and narrow baselines. To address this, we propose a novel sensor fusion approach using Gaussian splatting, which combines RGB, depth, and event cameras to capture and reconstruct deforming scenes at high speeds. The key insight of our method lies in leveraging the complementary strengths of these imaging modalities: RGB cameras capture detailed color information, event cameras record rapid scene changes with microsecond resolution, and depth cameras provide 3D scene geometry. To unify the underlying scene representation across these modalities, we represent the scene using deformable 3D Gaussians. To handle rapid scene movements, we jointly optimize the 3D Gaussian parameters and their temporal deformation fields by integrating data from all three sensor modalities. This fusion enables efficient, high-quality imaging of fast and complex scenes, even under challenging conditions such as low light, narrow baselines, or rapid motion. Experiments on synthetic and real datasets captured with our prototype sensor fusion setup demonstrate that our method significantly outperforms state-of-the-art techniques, achieving noticeable improvements in both rendering fidelity and structural accuracy.

High-Speed Dynamic 3D Imaging with Sensor Fusion Splatting

TL;DR

The paper tackles the challenge of capturing high-speed deforming 3D scenes beyond the capabilities of any single imaging modality. It proposes Sensor Fusion Splatting, a deformable 3D Gaussian framework that jointly fuses RGB, event, and depth data, guided by a temporal deformation model to produce time-resolved, high-fidelity reconstructions. The authors demonstrate superior rendering fidelity and structural accuracy on synthetic and real-world datasets, supported by a hardware prototype and robust performance under challenging conditions such as low light and small baselines. This approach enables efficient, multi-sensor high-speed 3D imaging with potential for robust novel-view synthesis and reduced data demands compared to traditional ultra-high-speed imaging systems.

Abstract

Capturing and reconstructing high-speed dynamic 3D scenes has numerous applications in computer graphics, vision, and interdisciplinary fields such as robotics, aerodynamics, and evolutionary biology. However, achieving this using a single imaging modality remains challenging. For instance, traditional RGB cameras suffer from low frame rates, limited exposure times, and narrow baselines. To address this, we propose a novel sensor fusion approach using Gaussian splatting, which combines RGB, depth, and event cameras to capture and reconstruct deforming scenes at high speeds. The key insight of our method lies in leveraging the complementary strengths of these imaging modalities: RGB cameras capture detailed color information, event cameras record rapid scene changes with microsecond resolution, and depth cameras provide 3D scene geometry. To unify the underlying scene representation across these modalities, we represent the scene using deformable 3D Gaussians. To handle rapid scene movements, we jointly optimize the 3D Gaussian parameters and their temporal deformation fields by integrating data from all three sensor modalities. This fusion enables efficient, high-quality imaging of fast and complex scenes, even under challenging conditions such as low light, narrow baselines, or rapid motion. Experiments on synthetic and real datasets captured with our prototype sensor fusion setup demonstrate that our method significantly outperforms state-of-the-art techniques, achieving noticeable improvements in both rendering fidelity and structural accuracy.

Paper Structure

This paper contains 13 sections, 13 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Sensor Fusion Imaging Prototype. Our sensor fusion system consists of RGB, event and depth cameras providing complementary sensory information. A high-speed deforming object placed on a rotating turntable is imaged by these cameras, as illustrated, and the proposed sensor fusion splatting approach is used for reconstruction. The complementary modality sensors and turntable are calibrated to ensure accurate alignment.
  • Figure 2: Comparison of Depth Map Reconstructions. We present a rendered depth map from an unseen camera viewpoint. Our method achieves the closest approximation to the ground truth depth compared to existing approaches.
  • Figure 3: Evaluating Structural Accuracy. We present images rendered from a viewpoint outside the training camera positions and a top-down view of the extracted point cloud. Our method accurately positions the red ball and captures its 3D structure, unlike other methods that either misplace it, represent it incompletely, or merge it with the green ball.
  • Figure 4: Synthetic Evaluations. We visualize four dynamic scenes synthesized using our method and two competing baselines (Deformable yang2024deformable and 4DGSwu20244d). Our method consistently outperforms the baseline methods capturing the finer appearance details.
  • Figure 5: Evaluation under Varying Baselines and Training Frames. We demonstrate the visual performance of our method under varying baselines and number of training frames. Our method achieves robust reconstruction quality even with small baselines and maintains high performance with sparse training samples.
  • ...and 2 more figures