Table of Contents
Fetching ...

E$^{3}$NeRF: Efficient Event-Enhanced Neural Radiance Fields from Blurry Images

Yunshan Qi, Jia Li, Yifan Zhao, Yu Zhang, Lin Zhu

TL;DR

E3NeRF introduces an efficient framework to reconstruct sharp Neural Radiance Fields from blurry images by leveraging synchronized event streams. It couples image formation and event generation through blur and event rendering losses, and further improves training efficiency with an event-guided spatial-temporal blur model and motion-aware event splitting. An event-guided pose estimation method enables real-world applicability without ground-truth poses. Across synthetic and real-world datasets, E3NeRF outperforms image-based, ERGB-based, and prior event-based NeRF approaches, especially under non-uniform, high-speed blur and low-light conditions. The work advances practical 3D scene understanding under challenging capture conditions and provides a benchmark for future ERGB-augmented NeRF research.

Abstract

Neural Radiance Fields (NeRF) achieves impressive novel view rendering performance by learning implicit 3D representation from sparse view images. However, it is difficult to reconstruct a sharp NeRF from blurry input that often occurs in the wild. To solve this problem, we propose a novel Efficient Event-Enhanced NeRF (E$^{3}$NeRF), reconstructing sharp NeRF by utilizing both blurry images and corresponding event streams. A blur rendering loss and an event rendering loss are introduced, which guide the NeRF training via modeling the physical image motion blur process and event generation process, respectively. To improve the efficiency of the framework, we further leverage the latent spatial-temporal blur information in the event stream to evenly distribute training over temporal blur and focus training on spatial blur. Moreover, a camera pose estimation framework for real-world data is built with the guidance of the events, generalizing the method to more practical applications. Compared to previous image-based and event-based NeRF works, our framework makes more profound use of the internal relationship between events and images. Extensive experiments on both synthetic data and real-world data demonstrate that E\textsuperscript{3}NeRF can effectively learn a sharp NeRF from blurry images, especially for high-speed non-uniform motion and low-light scenes.

E$^{3}$NeRF: Efficient Event-Enhanced Neural Radiance Fields from Blurry Images

TL;DR

E3NeRF introduces an efficient framework to reconstruct sharp Neural Radiance Fields from blurry images by leveraging synchronized event streams. It couples image formation and event generation through blur and event rendering losses, and further improves training efficiency with an event-guided spatial-temporal blur model and motion-aware event splitting. An event-guided pose estimation method enables real-world applicability without ground-truth poses. Across synthetic and real-world datasets, E3NeRF outperforms image-based, ERGB-based, and prior event-based NeRF approaches, especially under non-uniform, high-speed blur and low-light conditions. The work advances practical 3D scene understanding under challenging capture conditions and provides a benchmark for future ERGB-augmented NeRF research.

Abstract

Neural Radiance Fields (NeRF) achieves impressive novel view rendering performance by learning implicit 3D representation from sparse view images. However, it is difficult to reconstruct a sharp NeRF from blurry input that often occurs in the wild. To solve this problem, we propose a novel Efficient Event-Enhanced NeRF (ENeRF), reconstructing sharp NeRF by utilizing both blurry images and corresponding event streams. A blur rendering loss and an event rendering loss are introduced, which guide the NeRF training via modeling the physical image motion blur process and event generation process, respectively. To improve the efficiency of the framework, we further leverage the latent spatial-temporal blur information in the event stream to evenly distribute training over temporal blur and focus training on spatial blur. Moreover, a camera pose estimation framework for real-world data is built with the guidance of the events, generalizing the method to more practical applications. Compared to previous image-based and event-based NeRF works, our framework makes more profound use of the internal relationship between events and images. Extensive experiments on both synthetic data and real-world data demonstrate that E\textsuperscript{3}NeRF can effectively learn a sharp NeRF from blurry images, especially for high-speed non-uniform motion and low-light scenes.
Paper Structure (40 sections, 29 equations, 18 figures, 10 tables)

This paper contains 40 sections, 29 equations, 18 figures, 10 tables.

Figures (18)

  • Figure 1: In a low-light static scene, handheld traditional cameras often capture blurry images. Image-based deblurring NeRF, such as BAD-NeRFbad-nerf and Deblur-NeRFDeblur-NeRF, will fail to synthesize sharp novel view images when facing severe blur. With an event camera, we can capture the event streams corresponding to the blurry images. By using the light intensity change information in event data, E2NeRFe2nerf achieves a primary deblurring effect. In E3NeRF, we further extract and utilize the spatial-temporal blur information in events to spread training evenly on temporal blur and focus training on spatial blur. Additionally, we use motion-guided event splitting to determine the suitable number of event bins for each view. Eventually, E3NeRF realizes impressive implicit 3D representation learning results and high efficiency training in complex scenes with severely blurry input.
  • Figure 2: Comparison of input events under two different settings in Table \ref{['tab:1']}.
  • Figure 3: Temporal blur uniform binning. The left part of the figure is a blurry image caused by non-uniform motion, and the middle part is the corresponding event stream and visualized event bins. The right part of the figure shows the camera motion blur trajectory, motion speed, and the corresponding time axis. With the event-guided temporal blur binning, the time division points of event bins are concentrated on the moments with high motion speed (yellow arrow), and the camera poses corresponding to the time points are evenly distributed on the camera motion trajectory (red triangles on the black curve in the blue box).
  • Figure 4: The architecture of E3NeRF. The input is sparse views of blurry images and corresponding event streams. The figure shows the operation of one view as an example. For real-world data, we use the event-guided pose estimation model to obtain the pose sequence. For synthetic data, we use ground-truth poses as in NeRF. Then, we use the event-guided spatial-temporal blur model to focus the training on areas with spatial blur and distribute the training evenly over temporal blur. Simultaneously, we use motion-guided event splitting to split events for each view individually. For blurry pixels, as shown with the red arrows, the network renders $b+1$ virtual sharp colors, with which we calculate the predicted blurry color $\hat{C}_{blur}(\mathbf{x})$ and event bin $\hat{B}_{k}(\mathbf{x})$. Then, comparing with input color $C(\mathbf{x})$ and event bin $B_{k}(\mathbf{x})$, we get the proposed blur loss and event loss. For sharp pixels, as shown with the green arrows, we solely conduct a sharp loss as in NeRF. The above operation is repeated for each view during training.
  • Figure 5: Motion-guided event splitting. The left part of the figure is three input images of different views with different degrees of blur and their corresponding events under the same exposure time. Notice that the length of the event bins represents the time, and different densities of events are caused by different motion speeds. The middle of the figure shows the calculation of pixel offset $\Delta$ from pose $\mathbf{P}_{start}$ to $\mathbf{P}_{end}$. The right part of the figure shows the split event bins and camera motion.
  • ...and 13 more figures