Table of Contents
Fetching ...

AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene

Chaoran Feng, Wangbo Yu, Xinhua Cheng, Zhenyu Tang, Junwu Zhang, Li Yuan, Yonghong Tian

TL;DR

AE-NeRF tackles robust event-based 3D reconstruction under non-ideal pose and density conditions and scales to larger scenes. It introduces a pose-correction module that yields continuous SE(3) trajectories from asynchronous events and a hierarchical, two-phase e-NeRF with scene warping to maintain geometric consistency across views. The approach is regularized with event-distillation, event-reconstruction, temporal, and distortion losses, plus a learning-based color correction network, achieving state-of-the-art results on synthetic and real datasets. This work broadens practical applicability of event-based NeRFs and provides benchmarks for non-ideal, large-scale scenes.

Abstract

Compared to frame-based methods, computational neuromorphic imaging using event cameras offers significant advantages, such as minimal motion blur, enhanced temporal resolution, and high dynamic range. The multi-view consistency of Neural Radiance Fields combined with the unique benefits of event cameras, has spurred recent research into reconstructing NeRF from data captured by moving event cameras. While showing impressive performance, existing methods rely on ideal conditions with the availability of uniform and high-quality event sequences and accurate camera poses, and mainly focus on the object level reconstruction, thus limiting their practical applications. In this work, we propose AE-NeRF to address the challenges of learning event-based NeRF from non-ideal conditions, including non-uniform event sequences, noisy poses, and various scales of scenes. Our method exploits the density of event streams and jointly learn a pose correction module with an event-based NeRF (e-NeRF) framework for robust 3D reconstruction from inaccurate camera poses. To generalize to larger scenes, we propose hierarchical event distillation with a proposal e-NeRF network and a vanilla e-NeRF network to resample and refine the reconstruction process. We further propose an event reconstruction loss and a temporal loss to improve the view consistency of the reconstructed scene. We established a comprehensive benchmark that includes large-scale scenes to simulate practical non-ideal conditions, incorporating both synthetic and challenging real-world event datasets. The experimental results show that our method achieves a new state-of-the-art in event-based 3D reconstruction.

AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene

TL;DR

AE-NeRF tackles robust event-based 3D reconstruction under non-ideal pose and density conditions and scales to larger scenes. It introduces a pose-correction module that yields continuous SE(3) trajectories from asynchronous events and a hierarchical, two-phase e-NeRF with scene warping to maintain geometric consistency across views. The approach is regularized with event-distillation, event-reconstruction, temporal, and distortion losses, plus a learning-based color correction network, achieving state-of-the-art results on synthetic and real datasets. This work broadens practical applicability of event-based NeRFs and provides benchmarks for non-ideal, large-scale scenes.

Abstract

Compared to frame-based methods, computational neuromorphic imaging using event cameras offers significant advantages, such as minimal motion blur, enhanced temporal resolution, and high dynamic range. The multi-view consistency of Neural Radiance Fields combined with the unique benefits of event cameras, has spurred recent research into reconstructing NeRF from data captured by moving event cameras. While showing impressive performance, existing methods rely on ideal conditions with the availability of uniform and high-quality event sequences and accurate camera poses, and mainly focus on the object level reconstruction, thus limiting their practical applications. In this work, we propose AE-NeRF to address the challenges of learning event-based NeRF from non-ideal conditions, including non-uniform event sequences, noisy poses, and various scales of scenes. Our method exploits the density of event streams and jointly learn a pose correction module with an event-based NeRF (e-NeRF) framework for robust 3D reconstruction from inaccurate camera poses. To generalize to larger scenes, we propose hierarchical event distillation with a proposal e-NeRF network and a vanilla e-NeRF network to resample and refine the reconstruction process. We further propose an event reconstruction loss and a temporal loss to improve the view consistency of the reconstructed scene. We established a comprehensive benchmark that includes large-scale scenes to simulate practical non-ideal conditions, incorporating both synthetic and challenging real-world event datasets. The experimental results show that our method achieves a new state-of-the-art in event-based 3D reconstruction.
Paper Structure (38 sections, 20 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 38 sections, 20 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comprison of novel view synthesis (NVS) and pose correction using existing event-based methods. The scene is captured by an event camera with 360-degree non-uniform motion and poses are estimated from COLMAP.
  • Figure 2: Overview of AE-NeRF. For each event $e$ in the batch $\mathcal{E}$, randomly sampled from the raw sequences, we sample the timestamp $t_{\text{samp}}$ between the previous timestamp $t_i$ and the current timestamp $t_{i+1}$. We then use a pose correction network with timestamps-poses pairs to interpolate discrete poses with dense timestamps, yielding corrected poses at $t_i$, $t_{i+1}$, and $t_{\text{samp}}$. With these corrected poses, we process the event ray through scene warping and apply a two-stage e-NeRF to resample weights and distances, which infers the predicted log-radiance of pixel $\mathbf{v}$. The predicted event reconstruction difference and temporal gradient are then computed against the ground truth, utilizing distillation loss and distortion loss for regularization. Finally, a learning-based approach is employed for color correction to refine tone mapping.
  • Figure 3: Framework of Pose Correction Network and Color Correction Network.
  • Figure 4: Qualitative Comprison of Novel View Synthesis with AE-NeRF.
  • Figure 5: Proposed Synthetic Datasets of AE-NeRF.
  • ...and 3 more figures