Table of Contents
Fetching ...

E2EGS: Event-to-Edge Gaussian Splatting for Pose-Free 3D Reconstruction

Yunsoo Kim, Changki Sung, Dasol Hong, Hyun Myung

Abstract

The emergence of neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS) has advanced novel view synthesis (NVS). These methods, however, require high-quality RGB inputs and accurate corresponding poses, limiting robustness under real-world conditions such as fast camera motion or adverse lighting. Event cameras, which capture brightness changes at each pixel with high temporal resolution and wide dynamic range, enable precise sensing of dynamic scenes and offer a promising solution. However, existing event-based NVS methods either assume known poses or rely on depth estimation models that are bounded by their initial observations, failing to generalize as the camera traverses previously unseen regions. We present E2EGS, a pose-free framework operating solely on event streams. Our key insight is that edge information provides rich structural cues essential for accurate trajectory estimation and high-quality NVS. To extract edges from noisy event streams, we exploit the distinct spatio-temporal characteristics of edges and non-edge regions. The event camera's movement induces consistent events along edges, while non-edge regions produce sparse noise. We leverage this through a patch-based temporal coherence analysis that measures local variance to extract edges while robustly suppressing noise. The extracted edges guide structure-aware Gaussian initialization and enable edge-weighted losses throughout initialization, tracking, and bundle adjustment. Extensive experiments on both synthetic and real datasets demonstrate that E2EGS achieves superior reconstruction quality and trajectory accuracy, establishing a fully pose-free paradigm for event-based 3D reconstruction.

E2EGS: Event-to-Edge Gaussian Splatting for Pose-Free 3D Reconstruction

Abstract

The emergence of neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS) has advanced novel view synthesis (NVS). These methods, however, require high-quality RGB inputs and accurate corresponding poses, limiting robustness under real-world conditions such as fast camera motion or adverse lighting. Event cameras, which capture brightness changes at each pixel with high temporal resolution and wide dynamic range, enable precise sensing of dynamic scenes and offer a promising solution. However, existing event-based NVS methods either assume known poses or rely on depth estimation models that are bounded by their initial observations, failing to generalize as the camera traverses previously unseen regions. We present E2EGS, a pose-free framework operating solely on event streams. Our key insight is that edge information provides rich structural cues essential for accurate trajectory estimation and high-quality NVS. To extract edges from noisy event streams, we exploit the distinct spatio-temporal characteristics of edges and non-edge regions. The event camera's movement induces consistent events along edges, while non-edge regions produce sparse noise. We leverage this through a patch-based temporal coherence analysis that measures local variance to extract edges while robustly suppressing noise. The extracted edges guide structure-aware Gaussian initialization and enable edge-weighted losses throughout initialization, tracking, and bundle adjustment. Extensive experiments on both synthetic and real datasets demonstrate that E2EGS achieves superior reconstruction quality and trajectory accuracy, establishing a fully pose-free paradigm for event-based 3D reconstruction.
Paper Structure (15 sections, 6 equations, 6 figures, 4 tables)

This paper contains 15 sections, 6 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Edge-guided reconstruction framework. Our pipeline extracts robust edges from consecutive event maps (Sec. \ref{['sec:edge_detection']}), initializes edge-aware Gaussians (Sec. \ref{['sec:edge_init']}), and applies edge-guided losses during joint optimization of 3D scene representation and camera trajectory (Sec. \ref{['sec:edge_recon']}). Depth sampling generates edge-guided Gaussians along viewing rays using inverse depth distribution, while surface sampling randomly initializes complementary Gaussians to cover texture-less regions. $k$-NN stands for $k$-nearest neighbors.
  • Figure 2: ATE with respect to the length of the sequence.
  • Figure 3: Qualitative results on Replica dataset. Red boxes highlight regions of interest for comparison. Our method produces sharper boundaries and cleaner surfaces compared with baselines. IncEventGS shows failures including wave-like artifacts, missing details, and indistinct boundaries. IncEventGS$^\dagger$ exhibits severe reconstruction failures due to accumulated trajectory estimation errors.
  • Figure 4: Impact of trajectory error on reconstruction quality. (a) Ground truth. (b) IncEventGS exhibits multiple failure modes: spatial misalignment causing viewpoint shifts and blurred regions in distant areas beyond initial coverage (top), and objects disappearing (bottom). (c) IncEventGS$^\dagger$ fails to reconstruct scenes due to severe trajectory errors. (d) Our method achieves accurate reconstruction with sharp textures and correct spatial alignment.
  • Figure 5: Effect of edge-guided loss. Baseline (top) vs. ours with edge loss but random initialization (bottom) at early and final training stages. Each pair shows rendered image (left) and depth map (right). Depth maps show that our edge-guided loss enables faster convergence.
  • ...and 1 more figures