Table of Contents
Fetching ...

Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors

Zixin Zhang, Kanghao Chen, Lin Wang

TL;DR

Elite-EvGS tackles the challenge of reconstructing 3D scenes from asynchronous event streams by distilling priors from pre-trained Event-to-Video models into a 3D Gaussian Splatting framework. It introduces a warm-up initialization that uses E2V-generated frames to bootstrap a coarse 3DGS, followed by event-driven refinement with a progressive, counts-based event supervision strategy. The method combines an event-based loss and a regularization term derived from E2V priors to stabilize optimization and improve detail preservation. Across synthetic and real-world datasets, Elite-EvGS achieves state-of-the-art performance for event-only 3D reconstruction and demonstrates robustness under fast motion and low-light conditions.

Abstract

Event cameras are bio-inspired sensors that output asynchronous and sparse event streams, instead of fixed frames. Benefiting from their distinct advantages, such as high dynamic range and high temporal resolution, event cameras have been applied to address 3D reconstruction, important for robotic mapping. Recently, neural rendering techniques, such as 3D Gaussian splatting (3DGS), have been shown successful in 3D reconstruction. However, it still remains under-explored how to develop an effective event-based 3DGS pipeline. In particular, as 3DGS typically depends on high-quality initialization and dense multiview constraints, a potential problem appears for the 3DGS optimization with events given its inherent sparse property. To this end, we propose a novel event-based 3DGS framework, named Elite-EvGS. Our key idea is to distill the prior knowledge from the off-the-shelf event-to-video (E2V) models to effectively reconstruct 3D scenes from events in a coarse-to-fine optimization manner. Specifically, to address the complexity of 3DGS initialization from events, we introduce a novel warm-up initialization strategy that optimizes a coarse 3DGS from the frames generated by E2V models and then incorporates events to refine the details. Then, we propose a progressive event supervision strategy that employs the window-slicing operation to progressively reduce the number of events used for supervision. This subtly relives the temporal randomness of the event frames, benefiting the optimization of local textural and global structural details. Experiments on the benchmark datasets demonstrate that Elite-EvGS can reconstruct 3D scenes with better textural and structural details. Meanwhile, our method yields plausible performance on the captured real-world data, including diverse challenging conditions, such as fast motion and low light scenes.

Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors

TL;DR

Elite-EvGS tackles the challenge of reconstructing 3D scenes from asynchronous event streams by distilling priors from pre-trained Event-to-Video models into a 3D Gaussian Splatting framework. It introduces a warm-up initialization that uses E2V-generated frames to bootstrap a coarse 3DGS, followed by event-driven refinement with a progressive, counts-based event supervision strategy. The method combines an event-based loss and a regularization term derived from E2V priors to stabilize optimization and improve detail preservation. Across synthetic and real-world datasets, Elite-EvGS achieves state-of-the-art performance for event-only 3D reconstruction and demonstrates robustness under fast motion and low-light conditions.

Abstract

Event cameras are bio-inspired sensors that output asynchronous and sparse event streams, instead of fixed frames. Benefiting from their distinct advantages, such as high dynamic range and high temporal resolution, event cameras have been applied to address 3D reconstruction, important for robotic mapping. Recently, neural rendering techniques, such as 3D Gaussian splatting (3DGS), have been shown successful in 3D reconstruction. However, it still remains under-explored how to develop an effective event-based 3DGS pipeline. In particular, as 3DGS typically depends on high-quality initialization and dense multiview constraints, a potential problem appears for the 3DGS optimization with events given its inherent sparse property. To this end, we propose a novel event-based 3DGS framework, named Elite-EvGS. Our key idea is to distill the prior knowledge from the off-the-shelf event-to-video (E2V) models to effectively reconstruct 3D scenes from events in a coarse-to-fine optimization manner. Specifically, to address the complexity of 3DGS initialization from events, we introduce a novel warm-up initialization strategy that optimizes a coarse 3DGS from the frames generated by E2V models and then incorporates events to refine the details. Then, we propose a progressive event supervision strategy that employs the window-slicing operation to progressively reduce the number of events used for supervision. This subtly relives the temporal randomness of the event frames, benefiting the optimization of local textural and global structural details. Experiments on the benchmark datasets demonstrate that Elite-EvGS can reconstruct 3D scenes with better textural and structural details. Meanwhile, our method yields plausible performance on the captured real-world data, including diverse challenging conditions, such as fast motion and low light scenes.
Paper Structure (17 sections, 5 equations, 7 figures, 4 tables)

This paper contains 17 sections, 5 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Methodological and qualitative comparison among the E2VID-based, EventNeRF, and our proposed Elite-EvGS. (a) A baseline that employs the E2V models first reconstructs video frames from events and then generates the 3D scene. (b) EventNeRF eventnerf, on the other hand, relies solely on event loss to learn a NeRF representation. (c) We propose an event-based 3DGS approach that distills the geometric priors from the E2VID models to effectively reconstruct 3D scenes from events (with better LPIPS scores and lower training time than EventNeRF).
  • Figure 2: Overview of our Elite-EvGS framework. It takes the event stream as input and outputs a trained set of Gaussians. We first utilize E2VID models to initialize 3D Gaussians (see Sec. \ref{['sec:e2v']}). Then, we propose an adaptive event loss to supervise 3D Gaussians with events directly. (see Sec. \ref{['sec:event_optimization']}).
  • Figure 3: Illustration and visualization of our warm-up initialization strategy.
  • Figure 4: Qualitative comparison of different methods on EventNeRF Dataseteventnerf. Our method outperforms the baseline, including EventNeRF and E2VID e2v+3DGS 3dgs with better textural and geometric details.
  • Figure 5: Qualitative comparison on our captured real-world dataset. The input is event data. We compared with EventNeRF regarding the ability to synthesize novel views. Our method yields better real-world novel view synthesis capacity than EventNeRF.
  • ...and 2 more figures