Table of Contents
Fetching ...

SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera

Gaole Dai, Zhenyu Wang, Qinwen Xu, Ming Lu, Wen Chen, Boxin Shi, Shanghang Zhang, Tiejun Huang

TL;DR

This work tackles motion blur in novel view synthesis by leveraging spike cameras, which record absolute intensity at high temporal resolution, to provide sharp texture signals for NeRF and 3DGS training. It introduces Texture from Spike (TfS) loss, combining Texture from Interval and Texture from Playback reconstructions, plus a learnable grayscale converter to fuse spike-derived texture with color rendering in an end-to-end pipeline (SpikeNeRF). A synchronized spike-RGB camera system and a real-world RS-3D dataset are developed to validate improvements over state-of-the-art baselines, including event-based methods, with notable gains in PSNR, SSIM, and perceptual quality, while reducing computational overhead. The approach enables high-quality, blur-free NVS in both synthetic and real scenes and suggests broader benefits such as improved pose estimation and dynamic-range handling, with plans to extend spike-based INR research to additional domains.

Abstract

One of the most critical factors in achieving sharp Novel View Synthesis (NVS) using neural field methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) is the quality of the training images. However, Conventional RGB cameras are susceptible to motion blur. In contrast, neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information, which can provide a sharp representation of the scene as additional training data. Recent methods have explored the integration of event cameras to improve the quality of NVS. The event-RGB approaches have some limitations, such as high training costs and the inability to work effectively in the background. Instead, our study introduces a new method that uses the spike camera to overcome these limitations. By considering texture reconstruction from spike streams as ground truth, we design the Texture from Spike (TfS) loss. Since the spike camera relies on temporal integration instead of temporal differentiation used by event cameras, our proposed TfS loss maintains manageable training costs. It handles foreground objects with backgrounds simultaneously. We also provide a real-world dataset captured with our spike-RGB camera system to facilitate future research endeavors. We conduct extensive experiments using synthetic and real-world datasets to demonstrate that our design can enhance novel view synthesis across NeRF and 3DGS. The code and dataset will be made available for public access.

SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera

TL;DR

This work tackles motion blur in novel view synthesis by leveraging spike cameras, which record absolute intensity at high temporal resolution, to provide sharp texture signals for NeRF and 3DGS training. It introduces Texture from Spike (TfS) loss, combining Texture from Interval and Texture from Playback reconstructions, plus a learnable grayscale converter to fuse spike-derived texture with color rendering in an end-to-end pipeline (SpikeNeRF). A synchronized spike-RGB camera system and a real-world RS-3D dataset are developed to validate improvements over state-of-the-art baselines, including event-based methods, with notable gains in PSNR, SSIM, and perceptual quality, while reducing computational overhead. The approach enables high-quality, blur-free NVS in both synthetic and real scenes and suggests broader benefits such as improved pose estimation and dynamic-range handling, with plans to extend spike-based INR research to additional domains.

Abstract

One of the most critical factors in achieving sharp Novel View Synthesis (NVS) using neural field methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) is the quality of the training images. However, Conventional RGB cameras are susceptible to motion blur. In contrast, neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information, which can provide a sharp representation of the scene as additional training data. Recent methods have explored the integration of event cameras to improve the quality of NVS. The event-RGB approaches have some limitations, such as high training costs and the inability to work effectively in the background. Instead, our study introduces a new method that uses the spike camera to overcome these limitations. By considering texture reconstruction from spike streams as ground truth, we design the Texture from Spike (TfS) loss. Since the spike camera relies on temporal integration instead of temporal differentiation used by event cameras, our proposed TfS loss maintains manageable training costs. It handles foreground objects with backgrounds simultaneously. We also provide a real-world dataset captured with our spike-RGB camera system to facilitate future research endeavors. We conduct extensive experiments using synthetic and real-world datasets to demonstrate that our design can enhance novel view synthesis across NeRF and 3DGS. The code and dataset will be made available for public access.
Paper Structure (26 sections, 12 equations, 10 figures, 4 tables)

This paper contains 26 sections, 12 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Limitations of event-RGB methods.(a): Neuromorphic cameras can capture much richer temporal information than conventional RGB cameras. However, event streams are sparser than spike streams and cannot capture static background information. (b): Integrating event streams into NVS methods requires high training costs. Taking E2NeRF as an example, achieving higher accuracy requires increasing the number of independent renderings, resulting in significantly higher costs (top row). The reason for this is that simulating events requires the use of asynchronous differentiation (bottom row).
  • Figure 1: Comparative Architecture of NeRF and SpikeNeRF.
  • Figure 2: Illustration of the proposed TfS loss.(a): The Texture from Interval (TFI) and Texture from Playback (TFP) algorithms are commonly employed for spike reconstruction. TFP downsamples along the time axis, while TFI calculates the eligibility trace of each pixel. Both methods encounter a trade-off between noise and sharpness. (b): The Texture from Spike (TfS) loss combines the losses from TFI and TFP reconstructions, resulting in learned reconstructions that balance noise and sharpness.
  • Figure 2: The right column presents a comparative overview of price and configuration between spike, event, and regular high-speed cameras. The left column showcases the potential applications of the next generation spike camera.
  • Figure 3: SpikeNeRF - Introducing spike streams for NeRF deblur. The SpikeNeRF pipeline comprises three stages. To address the issue of blurry RGB images caused by relative motion, we utilize a synchronized spike-RGB camera system to capture spike streams. A detailed configuration of the GoPro9 RGB camera and Vidar spike camera used in this system can be found in Tab. \ref{['Tab1']}. The use of a beam splitter ensures a synchronized field of view. Additionally, to mitigate potential misalignment resulting from transmission delays, we incorporate a high-precision clock (0.0001s) at the trigger alignment side as an additional safeguard. On the Texture from Spike (TfS) front, the trade-off between noise and sharpness depicted in Fig \ref{['Fig2']} can be effectively balanced during learning by employing both TFP and TFI spike reconstruction techniques. Regarding color rendering, we employ learnable layers to weigh each RGB channel's sum into grayscale, which exhibits greater potential compared to the standard conversion
  • ...and 5 more figures