Table of Contents
Fetching ...

EventNeuS: 3D Mesh Reconstruction from a Single Event Camera

Shreyas Sachan, Viktor Rudnev, Mohamed Elgharib, Christian Theobalt, Vladislav Golyanik

TL;DR

EventNeuS introduces a self-supervised framework to reconstruct dense 3D meshes from monocular event streams by learning a neural implicit surface (SDF) together with a view-dependent radiance field. It replaces traditional view encoding with spherical harmonics, employs hierarchical sampling and frequency annealing, and optimizes a loss that aligns rendered temporal changes with the observed events, enabling mesh extraction via Marching Cubes. On synthetic and real data, it achieves substantial improvements in Chamfer distance and MAE over prior event-based methods, demonstrating robust surface recovery under fast motion and challenging lighting. The approach advances event-driven 3D reconstruction by leveraging implicit surface representations, while acknowledging limitations in large-scale scenes and texture-induced artefacts, and pointing to future integration with RGB-based priors or 3D Gaussian splatting.

Abstract

Event cameras offer a considerable alternative to RGB cameras in many scenarios. While there are recent works on event-based novel-view synthesis, dense 3D mesh reconstruction remains scarcely explored and existing event-based techniques are severely limited in their 3D reconstruction accuracy. To address this limitation, we present EventNeuS, a self-supervised neural model for learning 3D representations from monocular colour event streams. Our approach, for the first time, combines 3D signed distance function and density field learning with event-based supervision. Furthermore, we introduce spherical harmonics encodings into our model for enhanced handling of view-dependent effects. EventNeuS outperforms existing approaches by a significant margin, achieving 34% lower Chamfer distance and 31% lower mean absolute error on average compared to the best previous method.

EventNeuS: 3D Mesh Reconstruction from a Single Event Camera

TL;DR

EventNeuS introduces a self-supervised framework to reconstruct dense 3D meshes from monocular event streams by learning a neural implicit surface (SDF) together with a view-dependent radiance field. It replaces traditional view encoding with spherical harmonics, employs hierarchical sampling and frequency annealing, and optimizes a loss that aligns rendered temporal changes with the observed events, enabling mesh extraction via Marching Cubes. On synthetic and real data, it achieves substantial improvements in Chamfer distance and MAE over prior event-based methods, demonstrating robust surface recovery under fast motion and challenging lighting. The approach advances event-driven 3D reconstruction by leveraging implicit surface representations, while acknowledging limitations in large-scale scenes and texture-induced artefacts, and pointing to future integration with RGB-based priors or 3D Gaussian splatting.

Abstract

Event cameras offer a considerable alternative to RGB cameras in many scenarios. While there are recent works on event-based novel-view synthesis, dense 3D mesh reconstruction remains scarcely explored and existing event-based techniques are severely limited in their 3D reconstruction accuracy. To address this limitation, we present EventNeuS, a self-supervised neural model for learning 3D representations from monocular colour event streams. Our approach, for the first time, combines 3D signed distance function and density field learning with event-based supervision. Furthermore, we introduce spherical harmonics encodings into our model for enhanced handling of view-dependent effects. EventNeuS outperforms existing approaches by a significant margin, achieving 34% lower Chamfer distance and 31% lower mean absolute error on average compared to the best previous method.
Paper Structure (32 sections, 11 equations, 10 figures, 3 tables)

This paper contains 32 sections, 11 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Left: A moving monocular event camera capturing the asynchronous per-pixel brightness changes, with red (positive) and blue (negative) polarities representing pixel intensity changes over time. Centre: The captured event stream is processed by the proposed EventNeuS to reconstruct a detailed 3D mesh. Right: Compared to the previous state of the art eventnerf, our method produces a more accurate reconstruction with a lower Chamfer Distance (CD: 0.107 vs 0.298), highlighting its effectiveness in the event-based 3D reconstruction.
  • Figure 2: Overview of EventNeuS. We start with the trajectory that captures the object from multiple viewpoints (e.g., Seiffert's spherical spiral). We accumulate all events within a time window $[t_0, t_1]$ to form an event frame $E^k(t_0, t_1)$ (\ref{['subsec:accumulation']}). For each event frame, we randomly choose a mini-batch of pixels and sample points along the corresponding rays. After applying positional encoding (\ref{['sec:freq-anneal']}) to these 3D coordinates, we feed them into the $f_{\text{sdf}}$ network, which outputs a Signed Distance Function (SDF) and its gradients. We further refine our sampling near surfaces via importance sampling (\ref{['sec:imp-sampling']}). Next, we combine the resulting SDF features and gradients with view directions encoded via spherical harmonics $SH(d)$ in the $f_{\text{colour}}$ network to predict colour (\ref{['sec:sh_encoding']}). Finally, we convert the SDF to a density field using \ref{['sec:sdf2density']}, integrate it along each ray to obtain the accumulated transmittance$\alpha$, and use the colour predictions to get the rendered colour$c$ through volumetric rendering. We render two views (at the start and at the end of the event window) and take their difference, applying an Mean Squared Error (MSE) loss against the ground-truth accumulated event frame ( \ref{['sec:self-supervision']}). This difference enforces consistency between the rendered scene changes and the actual events recorded during $[t_0, t_1]$.
  • Figure 3: Qualitative comparison of our method with challenging baselines on the synthetic NeRF dataset nerf (first row: Chair; second row: Mic). Note that all results shown are mesh reconstructions, not volumetric renderings.
  • Figure 4: Qualitative comparison on the real dataset eventnerf with fast-rotating objects observed by an event camera. All results shown are mesh reconstructions, not volumetric renderings. The RGB views are provided for reference only. Best viewed with zoom.
  • Figure 5: SH encoding effectiveness on textured meshes (left: with SH, right: without SH). SH enable more detailed surfaces and better representation of view-dependent effects.
  • ...and 5 more figures