Table of Contents
Fetching ...

Mitigating Motion Blur in Neural Radiance Fields with Events and Frames

Marco Cannici, Davide Scaramuzza

TL;DR

This paper explicitly model the blur formation process, exploiting the event double integral as an additional model-based prior, and model the event-pixel response using an end-to-end learnable re-sponse function, allowing the method to adapt to non-idealities in the real event-camera sensor.

Abstract

Neural Radiance Fields (NeRFs) have shown great potential in novel view synthesis. However, they struggle to render sharp images when the data used for training is affected by motion blur. On the other hand, event cameras excel in dynamic scenes as they measure brightness changes with microsecond resolution and are thus only marginally affected by blur. Recent methods attempt to enhance NeRF reconstructions under camera motion by fusing frames and events. However, they face challenges in recovering accurate color content or constrain the NeRF to a set of predefined camera poses, harming reconstruction quality in challenging conditions. This paper proposes a novel formulation addressing these issues by leveraging both model- and learning-based modules. We explicitly model the blur formation process, exploiting the event double integral as an additional model-based prior. Additionally, we model the event-pixel response using an end-to-end learnable response function, allowing our method to adapt to non-idealities in the real event-camera sensor. We show, on synthetic and real data, that the proposed approach outperforms existing deblur NeRFs that use only frames as well as those that combine frames and events by +6.13dB and +2.48dB, respectively.

Mitigating Motion Blur in Neural Radiance Fields with Events and Frames

TL;DR

This paper explicitly model the blur formation process, exploiting the event double integral as an additional model-based prior, and model the event-pixel response using an end-to-end learnable re-sponse function, allowing the method to adapt to non-idealities in the real event-camera sensor.

Abstract

Neural Radiance Fields (NeRFs) have shown great potential in novel view synthesis. However, they struggle to render sharp images when the data used for training is affected by motion blur. On the other hand, event cameras excel in dynamic scenes as they measure brightness changes with microsecond resolution and are thus only marginally affected by blur. Recent methods attempt to enhance NeRF reconstructions under camera motion by fusing frames and events. However, they face challenges in recovering accurate color content or constrain the NeRF to a set of predefined camera poses, harming reconstruction quality in challenging conditions. This paper proposes a novel formulation addressing these issues by leveraging both model- and learning-based modules. We explicitly model the blur formation process, exploiting the event double integral as an additional model-based prior. Additionally, we model the event-pixel response using an end-to-end learnable response function, allowing our method to adapt to non-idealities in the real event-camera sensor. We show, on synthetic and real data, that the proposed approach outperforms existing deblur NeRFs that use only frames as well as those that combine frames and events by +6.13dB and +2.48dB, respectively.
Paper Structure (18 sections, 11 equations, 9 figures, 5 tables)

This paper contains 18 sections, 11 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Ev-DeblurNeRF combines blurry images and events to recover sharp NeRFs. A motion-aware NeRF recovers camera motion and a learnable event camera response function models real camera's non-idealities, enabling high-quality reconstructions.
  • Figure 2: Architecture of the proposed Ev-DeblurNeRF model. For each given ray $\mathbf{r}(\mathbf{u},t)$, placed at the center of the exposure time $\tau$, we estimate a set of warped rays $\mathbf{r}_q$ using $G_{\boldsymbol{\Phi}}$. We then sample features from an explicit volume $\mathcal{V}$ and fed these features to $F_\Omega$ to compute blurry colors through weighted averaging with $L_\text{blur}$. We supervise the color at the center of the exposure time through $\mathcal{L}_{EDI}$ by recovering a prior-based sharp color using the event double integral, considering all events in the exposure time. Finally, we sample a pair of two consecutive events, and supervise their brightness difference, modulated by eCRF, using the observed polarity value via $\mathcal{L}_{ev}.$
  • Figure 3: Qualitative comparison on synthetic (top) and real-world camera motion blur (bottom). Ev-DeblurNeRF recovers sharp and fine details, such as the letters in the last example, as well as accurate colors, outperforming other event-based and image-only methods.
  • Figure 4: Analysis of the robustness to sparse training views (left) and motion blur intensity (right) on samples from the Ev-DeblurCDAVIS data.
  • Figure 5: Qualitative ablation study of the main components of the proposed Ev-DeblurNeRF network. Tables below each picture are drawn from Table 3 of the paper, and report the configuration used and the PSNR metric achieved in each case.
  • ...and 4 more figures