Table of Contents
Fetching ...

EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting

Bohao Liao, Wei Zhai, Zengyu Wan, Zhixin Cheng, Wenfei Yang, Tianzhu Zhang, Yang Cao, Zheng-Jun Zha

TL;DR

This work tackles robust 3D scene reconstruction from high-speed, free-trajectory video by integrating asynchronous event camera data into 3D Gaussian Splatting (3DGS). It introduces Event Generation Model (EGM), Contrast Maximization with Linear Event Generation Model (LEGM), and Photometric Bundle Adjustment (PBA), plus a two-stage Fixed-GS training strategy to jointly optimize camera poses and the scene while recovering color. Across Tanks and Temples and RealEv-DAVIS, EF-3DGS consistently outperforms frame-only and prior event-based baselines in both rendering quality (PSNR/SSIM/LPIPS) and trajectory accuracy (ATE/RPE), particularly at low frame rates. The approach demonstrates that event data can substantially alleviate pose ambiguity and sparse-view limitations in dynamic scenes, enabling more reliable, high-fidelity reconstructions with practical compute requirements.

Abstract

Scene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attempted to simultaneously optimize scene representations (NeRF or 3DGS) and camera poses. Despite recent progress, existing methods relying on traditional camera input tend to fail in high-speed (or equivalently low-frame-rate) scenarios. Event cameras, inspired by biological vision, record pixel-wise intensity changes asynchronously with high temporal resolution, providing valuable scene and motion information in blind inter-frame intervals. In this paper, we introduce the event camera to aid scene construction from a casually captured video for the first time, and propose Event-Aided Free-Trajectory 3DGS, called EF-3DGS, which seamlessly integrates the advantages of event cameras into 3DGS through three key components. First, we leverage the Event Generation Model (EGM) to fuse events and frames, supervising the rendered views observed by the event stream. Second, we adopt the Contrast Maximization (CMax) framework in a piece-wise manner to extract motion information by maximizing the contrast of the Image of Warped Events (IWE), thereby calibrating the estimated poses. Besides, based on the Linear Event Generation Model (LEGM), the brightness information encoded in the IWE is also utilized to constrain the 3DGS in the gradient domain. Third, to mitigate the absence of color information of events, we introduce photometric bundle adjustment (PBA) to ensure view consistency across events and frames. We evaluate our method on the public Tanks and Temples benchmark and a newly collected real-world dataset, RealEv-DAVIS. Our project page is https://lbh666.github.io/ef-3dgs/.

EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting

TL;DR

This work tackles robust 3D scene reconstruction from high-speed, free-trajectory video by integrating asynchronous event camera data into 3D Gaussian Splatting (3DGS). It introduces Event Generation Model (EGM), Contrast Maximization with Linear Event Generation Model (LEGM), and Photometric Bundle Adjustment (PBA), plus a two-stage Fixed-GS training strategy to jointly optimize camera poses and the scene while recovering color. Across Tanks and Temples and RealEv-DAVIS, EF-3DGS consistently outperforms frame-only and prior event-based baselines in both rendering quality (PSNR/SSIM/LPIPS) and trajectory accuracy (ATE/RPE), particularly at low frame rates. The approach demonstrates that event data can substantially alleviate pose ambiguity and sparse-view limitations in dynamic scenes, enabling more reliable, high-fidelity reconstructions with practical compute requirements.

Abstract

Scene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attempted to simultaneously optimize scene representations (NeRF or 3DGS) and camera poses. Despite recent progress, existing methods relying on traditional camera input tend to fail in high-speed (or equivalently low-frame-rate) scenarios. Event cameras, inspired by biological vision, record pixel-wise intensity changes asynchronously with high temporal resolution, providing valuable scene and motion information in blind inter-frame intervals. In this paper, we introduce the event camera to aid scene construction from a casually captured video for the first time, and propose Event-Aided Free-Trajectory 3DGS, called EF-3DGS, which seamlessly integrates the advantages of event cameras into 3DGS through three key components. First, we leverage the Event Generation Model (EGM) to fuse events and frames, supervising the rendered views observed by the event stream. Second, we adopt the Contrast Maximization (CMax) framework in a piece-wise manner to extract motion information by maximizing the contrast of the Image of Warped Events (IWE), thereby calibrating the estimated poses. Besides, based on the Linear Event Generation Model (LEGM), the brightness information encoded in the IWE is also utilized to constrain the 3DGS in the gradient domain. Third, to mitigate the absence of color information of events, we introduce photometric bundle adjustment (PBA) to ensure view consistency across events and frames. We evaluate our method on the public Tanks and Temples benchmark and a newly collected real-world dataset, RealEv-DAVIS. Our project page is https://lbh666.github.io/ef-3dgs/.

Paper Structure

This paper contains 33 sections, 19 equations, 20 figures, 12 tables, 1 algorithm.

Figures (20)

  • Figure 1: Free-trajectory 3DGS under high speed. (Top) The overall paradigm. The colored dots in the top row represent the event data (red: positive, blue: negative). We leverage continuous event streams to aid discrete video frames captured along free trajectories in high-speed scenarios, jointly optimizing camera poses and reconstructing the 3DGS. Our method surpasses current state-of-the-art methods in terms of both rendered results (middle) and pose estimation (bottom).
  • Figure 2: Method overview. The inputs are video frames and event stream. In the first stage, we progressively add new event images, leveraging the events and most recent frame to establish the event-driven optimization. In the second stage, we adopt the Fixed-GS strategy to mitigate the color distortion of 3DGS. The details of $\mathcal{L}_{LEGM}$ and CMax framework are shown in Fig. \ref{['fig:fig3']}.
  • Figure 3: The illustration of unified CMax and LEGM optimization. We warp previous event frames to the sampled timestamp through the optical flow and maximize the sharpness of the image of IPWE. The byproduct IPWE is utilized to establish additional constraints on 3DGS.
  • Figure 4: Qualitative comparison for novel view synthesis. The first two rows are from Tanks and Temples and the last row is from RealEv-DAVIS. Our approach produces more realistic rendering results with fine-grained details. Better viewed when zoomed in.
  • Figure 5: Pose estimation comparison. We visualise the trajectory (3D plot) and $\mathrm{RPE}_r$ (color bar) of each method. We clip and normalize the $\mathrm{RPE}_r$ by a quarter of the max $\mathrm{RPE}_r$ across all results of each scene.
  • ...and 15 more figures