Table of Contents
Fetching ...

SweepEvGS: Event-Based 3D Gaussian Splatting for Macro and Micro Radiance Field Rendering from a Single Sweep

Jingqian Wu, Shuo Zhu, Chutian Wang, Boxin Shi, Edmund Y. Lam

TL;DR

SweepEvGS addresses the bottleneck of dense, high-quality frame capture for radiance-field rendering by integrating monocular event cameras with 3D Gaussian Splatting to synthesize novel views from a single camera sweep. By using the initial static frame plus dense asynchronous events, the method trains an end-to-end pipeline that supervises radiance-field reconstruction with event-derived signals, augmented by a linlog-based luminance difference and a D-SSIM term. The authors demonstrate robustness across synthetic, real-world macro, and real-world microscopic imaging settings, showing substantial gains in rendering quality and orders-of-magnitude improvements in speed and efficiency over NeRF-based approaches. This work highlights the practical potential of event-based radiance-field rendering for dynamic environments, enabling fast, high-fidelity view synthesis with minimal data collection and hardware constraints.

Abstract

Recent advancements in 3D Gaussian Splatting (3D-GS) have demonstrated the potential of using 3D Gaussian primitives for high-speed, high-fidelity, and cost-efficient novel view synthesis from continuously calibrated input views. However, conventional methods require high-frame-rate dense and high-quality sharp images, which are time-consuming and inefficient to capture, especially in dynamic environments. Event cameras, with their high temporal resolution and ability to capture asynchronous brightness changes, offer a promising alternative for more reliable scene reconstruction without motion blur. In this paper, we propose SweepEvGS, a novel hardware-integrated method that leverages event cameras for robust and accurate novel view synthesis across various imaging settings from a single sweep. SweepEvGS utilizes the initial static frame with dense event streams captured during a single camera sweep to effectively reconstruct detailed scene views. We also introduce different real-world hardware imaging systems for real-world data collection and evaluation for future research. We validate the robustness and efficiency of SweepEvGS through experiments in three different imaging settings: synthetic objects, real-world macro-level, and real-world micro-level view synthesis. Our results demonstrate that SweepEvGS surpasses existing methods in visual rendering quality, rendering speed, and computational efficiency, highlighting its potential for dynamic practical applications.

SweepEvGS: Event-Based 3D Gaussian Splatting for Macro and Micro Radiance Field Rendering from a Single Sweep

TL;DR

SweepEvGS addresses the bottleneck of dense, high-quality frame capture for radiance-field rendering by integrating monocular event cameras with 3D Gaussian Splatting to synthesize novel views from a single camera sweep. By using the initial static frame plus dense asynchronous events, the method trains an end-to-end pipeline that supervises radiance-field reconstruction with event-derived signals, augmented by a linlog-based luminance difference and a D-SSIM term. The authors demonstrate robustness across synthetic, real-world macro, and real-world microscopic imaging settings, showing substantial gains in rendering quality and orders-of-magnitude improvements in speed and efficiency over NeRF-based approaches. This work highlights the practical potential of event-based radiance-field rendering for dynamic environments, enabling fast, high-fidelity view synthesis with minimal data collection and hardware constraints.

Abstract

Recent advancements in 3D Gaussian Splatting (3D-GS) have demonstrated the potential of using 3D Gaussian primitives for high-speed, high-fidelity, and cost-efficient novel view synthesis from continuously calibrated input views. However, conventional methods require high-frame-rate dense and high-quality sharp images, which are time-consuming and inefficient to capture, especially in dynamic environments. Event cameras, with their high temporal resolution and ability to capture asynchronous brightness changes, offer a promising alternative for more reliable scene reconstruction without motion blur. In this paper, we propose SweepEvGS, a novel hardware-integrated method that leverages event cameras for robust and accurate novel view synthesis across various imaging settings from a single sweep. SweepEvGS utilizes the initial static frame with dense event streams captured during a single camera sweep to effectively reconstruct detailed scene views. We also introduce different real-world hardware imaging systems for real-world data collection and evaluation for future research. We validate the robustness and efficiency of SweepEvGS through experiments in three different imaging settings: synthetic objects, real-world macro-level, and real-world micro-level view synthesis. Our results demonstrate that SweepEvGS surpasses existing methods in visual rendering quality, rendering speed, and computational efficiency, highlighting its potential for dynamic practical applications.

Paper Structure

This paper contains 25 sections, 10 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Frame-based devices capture experience blur in real-world macro imaging settings when undergoing fast camera/object movement (a). Blur also occurs in real-world microimaging settings such as microscopy when undergoing drastic object displacement (b). Our SweepEvGS is able to render sharp and accurate detailed radiance field representation across various real-world hardware imaging settings (c) and (d).
  • Figure 2: Overview of the SweepEvGS pipeline for novel view synthesis. The process begins with capturing a static object or scene using an event camera (a). The camera captures signals (b) including the initial frame before the sweep starts when static, and the dense event streams representing intensity changes during the camera sweep. On the one hand, these signals are utilized to form (purple arrow) the ground truth signals for training and supervision, as described in Sec. \ref{['Sec: Event Stream Utilization']}. On the other hand, the 3D-GS model, described in Sec. \ref{['Sec: 3D GS']}, renders (green arrow) when given the corresponding camera position and pose, and formalizes (red arrow) the rendered results to predictions. The designed loss supervises the training pipeline, as described in Sec. \ref{['Sec: Supervision']}. Our method efficiently bridges the gap between sparse event data and dense scene reconstruction, enabling high-quality novel view synthesis.
  • Figure 3: Demonstration of Hardware Setup. (a) Real-world macro camera system setup for objects with depth: the DAVIS 346C event camera is fixed, and the object is placed on a fast turntable with known constant speed. The camera will capture events when there are changes in the view angle from the object. (b) Real-world macro camera system setup for plain objects: the DAVIS 346C event camera is placed on a translation station that enables horizontal linear movements in both red and blue directions. (c) Real-world microscopic camera system setup: the DAVIS 346C event camera is placed on a widefield microscope that enables horizontal linear movements in both red and blue directions. The camera will capture events and frames from the object placed on the observation plate.
  • Figure 4: A visual comparison of our approach against others on real-world datasets: the captured frames (a) from traditional frame-based camera results in motion blur; using traditional frame-based radiance field approach, such as 3D-GS (b) produce blurred radiance field reconstruction and view rendering; event-based NeRF approach (c), time and resource consuming while still produce accurate rendering results; further, existing event-based GS approaches (d and e) failed on real captured data as more noise and randomness are introduced in event streams; our approach (f), leveraging the proposed multi-modal data utilization and supervision, renders sharp and accurate results compared to the ground truth references (g). We visually compare the results across all methods on real-world macro sequences (toy, optical component, keyboard, and cloth) captured by hardware in Figure \ref{['fig:setup']}(a) and Figure \ref{['fig:setup']}(b); and also on real-world microscopy sequences captured by hardware in Figure \ref{['fig:setup']}(c). Dotted boxes indicate areas of interest, and the solid line boxes reveal zoomed results.
  • Figure 5: More visualization of the rendered results on objects from different imaging settings. The top two rows are synthetic data. The middle four rows are real-world macro objects. The bottom two rows are real-world objects captured under the microscope. We showcase the rendering results from different view perspectives for each object. We also demonstrate the rendered results on unseen views of the keyboard and cloth data. This also showcases that our approach is able to reconstruct sharp unseen views with clear semantic information.
  • ...and 3 more figures