Table of Contents
Fetching ...

IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera

Jian Huang, Chengrui Dong, Xuanhua Chen, Peidong Liu

TL;DR

IncEventGS introduces a pose-free, incremental $3$D Gaussian Splatting system trained solely on a single event camera. By processing the event stream in chunks and optimizing both the $3$D Gaussian representation and a continuous $SE(3)$ camera trajectory via an event-based loss and SSIM term, it achieves high-quality novel view synthesis and accurate motion estimation without ground-truth poses. The method bootstraps with random Gaussians and depth-based reinitialization, grows the map incrementally with a visibility-guided expansion, and performs dense bundle adjustment in a sliding window, yielding strong results against state-of-the-art event-based NeRFs and VO baselines on Replica and TUM-VIE datasets. It also demonstrates extensions to color events and fast-motion scenarios, with substantial speed advantages over frame-based NeRF approaches, highlighting practical applicability for real-time event-based 3D reconstruction.

Abstract

Implicit neural representation and explicit 3D Gaussian Splatting (3D-GS) for novel view synthesis have achieved remarkable progress with frame-based camera (e.g. RGB and RGB-D cameras) recently. Compared to frame-based camera, a novel type of bio-inspired visual sensor, i.e. event camera, has demonstrated advantages in high temporal resolution, high dynamic range, low power consumption and low latency. Due to its unique asynchronous and irregular data capturing process, limited work has been proposed to apply neural representation or 3D Gaussian splatting for an event camera. In this work, we present IncEventGS, an incremental 3D Gaussian Splatting reconstruction algorithm with a single event camera. To recover the 3D scene representation incrementally, we exploit the tracking and mapping paradigm of conventional SLAM pipelines for IncEventGS. Given the incoming event stream, the tracker firstly estimates an initial camera motion based on prior reconstructed 3D-GS scene representation. The mapper then jointly refines both the 3D scene representation and camera motion based on the previously estimated motion trajectory from the tracker. The experimental results demonstrate that IncEventGS delivers superior performance compared to prior NeRF-based methods and other related baselines, even we do not have the ground-truth camera poses. Furthermore, our method can also deliver better performance compared to state-of-the-art event visual odometry methods in terms of camera motion estimation. Code is publicly available at: https://github.com/wu-cvgl/IncEventGS.

IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera

TL;DR

IncEventGS introduces a pose-free, incremental D Gaussian Splatting system trained solely on a single event camera. By processing the event stream in chunks and optimizing both the D Gaussian representation and a continuous camera trajectory via an event-based loss and SSIM term, it achieves high-quality novel view synthesis and accurate motion estimation without ground-truth poses. The method bootstraps with random Gaussians and depth-based reinitialization, grows the map incrementally with a visibility-guided expansion, and performs dense bundle adjustment in a sliding window, yielding strong results against state-of-the-art event-based NeRFs and VO baselines on Replica and TUM-VIE datasets. It also demonstrates extensions to color events and fast-motion scenarios, with substantial speed advantages over frame-based NeRF approaches, highlighting practical applicability for real-time event-based 3D reconstruction.

Abstract

Implicit neural representation and explicit 3D Gaussian Splatting (3D-GS) for novel view synthesis have achieved remarkable progress with frame-based camera (e.g. RGB and RGB-D cameras) recently. Compared to frame-based camera, a novel type of bio-inspired visual sensor, i.e. event camera, has demonstrated advantages in high temporal resolution, high dynamic range, low power consumption and low latency. Due to its unique asynchronous and irregular data capturing process, limited work has been proposed to apply neural representation or 3D Gaussian splatting for an event camera. In this work, we present IncEventGS, an incremental 3D Gaussian Splatting reconstruction algorithm with a single event camera. To recover the 3D scene representation incrementally, we exploit the tracking and mapping paradigm of conventional SLAM pipelines for IncEventGS. Given the incoming event stream, the tracker firstly estimates an initial camera motion based on prior reconstructed 3D-GS scene representation. The mapper then jointly refines both the 3D scene representation and camera motion based on the previously estimated motion trajectory from the tracker. The experimental results demonstrate that IncEventGS delivers superior performance compared to prior NeRF-based methods and other related baselines, even we do not have the ground-truth camera poses. Furthermore, our method can also deliver better performance compared to state-of-the-art event visual odometry methods in terms of camera motion estimation. Code is publicly available at: https://github.com/wu-cvgl/IncEventGS.

Paper Structure

This paper contains 21 sections, 14 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: The pipeline of IncEventGS. IncEventGS processes incoming event stream by dividing it into chunks and representing the camera trajectory as a continuous model. It randomly samples two close consecutive timestamps to integrate the corresponding event streams. Two brightness images are rendered from 3D-GS at the corresponding poses, and we minimize the photometric loss between the synthesized and measured brightness change. During initialization, a pre-trained depth estimation model estimates depth from the rendered images to bootstrap the system.
  • Figure 2: Qualitative evaluation of novel view image synthesis on the Replica dataset. The experimental results demonstrate that our method renders higher-quality images with fewer artifacts compared to event-based NeRF and two-stage approaches.
  • Figure 3: Qualitative evaluation for novel view image synthesis on real dataset. It demonstrates that our method is able to render better images with fewer artifacts than event NeRF methods and two-stage methods. Note that there are no GT images aligned with the event camera, and we choose the closest images from the RGB camera and crop them to the same size as the rendered images for visual comparisons.
  • Figure 4: Representative visualization of ATE error mapped onto trajectories for the synthetic (office0) and real (6dof) datasets, generated by the EVO toolbox using the same ground truth poses, demonstrating the superior performance of our method in pose estimation.
  • Figure 5: The re-initialization process of IncEventGS.
  • ...and 2 more figures