Table of Contents
Fetching ...

E-3DGS: Event-Based Novel View Rendering of Large-Scale Scenes Using 3D Gaussian Splatting

Sohaib Zahid, Viktor Rudnev, Eddy Ilg, Vladislav Golyanik

TL;DR

This work tackles the limitations of RGB-based novel view synthesis in challenging lighting and high-speed scenes by introducing E-3DGS, which uses 3D Gaussian splatting supervised by color event streams to render large-scale, unbounded scenes. The method integrates frustum-based initialization, adaptive event windows, isotropic Gaussian regularization, and pose refinement via Gram-Schmidt to achieve fast training and rendering while maintaining high visual fidelity. Empirical results on real and synthetic datasets show that E-3DGS outperforms EventNeRF by 11–25% in PSNR and operates orders of magnitude faster, with robust performance across diverse scenes. The work also contributes new real and synthetic event datasets, establishing a scalable framework for event-based view synthesis with practical implications for robotics, AR/VR, and large-scale visualization.

Abstract

Novel view synthesis techniques predominantly utilize RGB cameras, inheriting their limitations such as the need for sufficient lighting, susceptibility to motion blur, and restricted dynamic range. In contrast, event cameras are significantly more resilient to these limitations but have been less explored in this domain, particularly in large-scale settings. Current methodologies primarily focus on front-facing or object-oriented (360-degree view) scenarios. For the first time, we introduce 3D Gaussians for event-based novel view synthesis. Our method reconstructs large and unbounded scenes with high visual quality. We contribute the first real and synthetic event datasets tailored for this setting. Our method demonstrates superior novel view synthesis and consistently outperforms the baseline EventNeRF by a margin of 11-25% in PSNR (dB) while being orders of magnitude faster in reconstruction and rendering.

E-3DGS: Event-Based Novel View Rendering of Large-Scale Scenes Using 3D Gaussian Splatting

TL;DR

This work tackles the limitations of RGB-based novel view synthesis in challenging lighting and high-speed scenes by introducing E-3DGS, which uses 3D Gaussian splatting supervised by color event streams to render large-scale, unbounded scenes. The method integrates frustum-based initialization, adaptive event windows, isotropic Gaussian regularization, and pose refinement via Gram-Schmidt to achieve fast training and rendering while maintaining high visual fidelity. Empirical results on real and synthetic datasets show that E-3DGS outperforms EventNeRF by 11–25% in PSNR and operates orders of magnitude faster, with robust performance across diverse scenes. The work also contributes new real and synthetic event datasets, establishing a scalable framework for event-based view synthesis with practical implications for robotics, AR/VR, and large-scale visualization.

Abstract

Novel view synthesis techniques predominantly utilize RGB cameras, inheriting their limitations such as the need for sufficient lighting, susceptibility to motion blur, and restricted dynamic range. In contrast, event cameras are significantly more resilient to these limitations but have been less explored in this domain, particularly in large-scale settings. Current methodologies primarily focus on front-facing or object-oriented (360-degree view) scenarios. For the first time, we introduce 3D Gaussians for event-based novel view synthesis. Our method reconstructs large and unbounded scenes with high visual quality. We contribute the first real and synthetic event datasets tailored for this setting. Our method demonstrates superior novel view synthesis and consistently outperforms the baseline EventNeRF by a margin of 11-25% in PSNR (dB) while being orders of magnitude faster in reconstruction and rendering.

Paper Structure

This paper contains 36 sections, 12 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Overview of our E-3DGS Method. We use 3D Gaussians 3dgs as the scene representation and assume that initial noisy camera poses are available. We randomly initialize the scene with our frustum-based initialization (Sec. \ref{['sec:frustum_init']}) and then optimize the Gaussians and the camera poses jointly (Sec. \ref{['sec:pose_refinement']}). To obtain a high-quality reconstruction of both, low-frequency structure and high-frequency detail, we propose a strategy using a large event window from $t_{s_1}$ to $t$ and a small one from $t_{s_2}$ to $t$ (Sec. \ref{['subsec:adaptive_window']}). We then define the loss $\mathcal{L}_\mathrm{recon}$ (Sec. \ref{['ssec:Optimization']}) between renderings from our model at the current time $t$ (indicated green) and previous times $t_{s_1}$ (indicated orange) and $t_{s_2}$ (indicated red), and the accumulated incoming events $E(t_{s_1},t)$ and $E(t_{s_2},t)$. We regularize the 3D Gaussians with the loss $\mathcal{L}_\mathrm{iso}$ (Sec. \ref{['ssec:IsotropicReg']}).
  • Figure 2: Two different views of the scene with inanimate objects assembled in the multi-view studio of MPI for Informatics.
  • Figure 3: Comparison of E-3DGS against the baselines and ablation study on the E-3DGS-Real dataset. Deblur-GS, E2VID + 3DGS and EventNeRF suffer from various issues including blurring, floaters, and noise. In contrast, our method delivers clear details, such as the intricate structure of the sculpture's face.
  • Figure 4: Comparison of E-3DGS vs. EventNeRF on the synthetic EventNeRF dataset. EventNeRF struggles with noise in the Drums sequence, blurriness in Ficus, and background artifacts in Lego and Materials sequences, while E-3DGS handles these issues well.
  • Figure 5: Comparison of E-3DGS vs. Robust E-NeRF on the TUM-VIE dataset. While Robust E-NeRF achieves higher local contrast, it suffers from globally inconsistent brightness. E-3DGS produces consistent brightness across the scene, albeit with some detail loss (e.g., in the table texture of the mocap-desk2 sequence).
  • ...and 5 more figures