Table of Contents
Fetching ...

Geometric-Photometric Event-based 3D Gaussian Ray Tracing

Kai Kohyama, Yoshimitsu Aoki, Guillermo Gallego, Shintaro Shiba

TL;DR

This work addresses the challenge of exploiting the high temporal resolution of event cameras for 3D Gaussian Splatting by decoupling rendering into two branches: event-by-event depth (geometry) and snapshot-based radiance (appearance). It introduces a differentiable, ray-traced event-based GS framework connected by the image of warped events, employing a geometric (Contrast Maximization) and photometric loss with an initialization that does not rely on pretrained models or COLMAP. The method demonstrates state-of-the-art performance on real-world datasets and competitive results on synthetic data, with significantly faster training times and robustness to the number of events used. It offers a practical path toward high-temporal-resolution 3D reconstruction from sparse event data without external priors.

Abstract

Event cameras offer a high temporal resolution over traditional frame-based cameras, which makes them suitable for motion and structure estimation. However, it has been unclear how event-based 3D Gaussian Splatting (3DGS) approaches could leverage fine-grained temporal information of sparse events. This work proposes a framework to address the trade-off between accuracy and temporal resolution in event-based 3DGS. Our key idea is to decouple the rendering into two branches: event-by-event geometry (depth) rendering and snapshot-based radiance (intensity) rendering, by using ray-tracing and the image of warped events. The extensive evaluation shows that our method achieves state-of-the-art performance on the real-world datasets and competitive performance on the synthetic dataset. Also, the proposed method works without prior information (e.g., pretrained image reconstruction models) or COLMAP-based initialization, is more flexible in the event selection number, and achieves sharp reconstruction on scene edges with fast training time. We hope that this work deepens our understanding of the sparse nature of events for 3D reconstruction. The code will be released.

Geometric-Photometric Event-based 3D Gaussian Ray Tracing

TL;DR

This work addresses the challenge of exploiting the high temporal resolution of event cameras for 3D Gaussian Splatting by decoupling rendering into two branches: event-by-event depth (geometry) and snapshot-based radiance (appearance). It introduces a differentiable, ray-traced event-based GS framework connected by the image of warped events, employing a geometric (Contrast Maximization) and photometric loss with an initialization that does not rely on pretrained models or COLMAP. The method demonstrates state-of-the-art performance on real-world datasets and competitive results on synthetic data, with significantly faster training times and robustness to the number of events used. It offers a practical path toward high-temporal-resolution 3D reconstruction from sparse event data without external priors.

Abstract

Event cameras offer a high temporal resolution over traditional frame-based cameras, which makes them suitable for motion and structure estimation. However, it has been unclear how event-based 3D Gaussian Splatting (3DGS) approaches could leverage fine-grained temporal information of sparse events. This work proposes a framework to address the trade-off between accuracy and temporal resolution in event-based 3DGS. Our key idea is to decouple the rendering into two branches: event-by-event geometry (depth) rendering and snapshot-based radiance (intensity) rendering, by using ray-tracing and the image of warped events. The extensive evaluation shows that our method achieves state-of-the-art performance on the real-world datasets and competitive performance on the synthetic dataset. Also, the proposed method works without prior information (e.g., pretrained image reconstruction models) or COLMAP-based initialization, is more flexible in the event selection number, and achieves sharp reconstruction on scene edges with fast training time. We hope that this work deepens our understanding of the sparse nature of events for 3D reconstruction. The code will be released.

Paper Structure

This paper contains 26 sections, 10 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Overview of the proposed method, which takes raw events and poses as input. During the optimization of the 3D Gaussians, rendering is decoupled into two pathways: event-by-event (temporally dense) depth rendering and spatially dense intensity rendering. We use the image of warped events to connect these two pathways to compute both geometric and photometric losses. The color results are from a synthetic dataset, and the monochrome results are from two standard, real-world datasets.
  • Figure 2: Method overview. Using ray-tracing renderer, we estimate depth for each event and compute the flow with the interpolated poses (i.e., motion field). Performing event warping produces the image of warped events at $t_\text{mid}$ and computes the contrast loss. We render the dense intensity (radiance) at $t_\text{mid}$ and compute the instantaneous brightness increment image, which we use for the photometric loss.
  • Figure 3: Visualization of dense/sparse depth and optical flow. Sparse depth and optical flow are not simply obtained by masking the dense counterparts, but by actual event-by-event ray tracing (\ref{['sec:method:raytracing']}). Top: using real events (EDS). Bottom: using synthetic events. The flow color notation is specified in \ref{['fig:method']}.
  • Figure 4: Results on the real-world datasets EDS Hidalgo22cvpr and TUM-VIE Klenk21iros. The event camera's field of view in the TUM dataset is narrower than the GT (i.e., frame camera) in the vertical direction.
  • Figure 5: Qualitative results on the color synthetic dataset Low23iccv.
  • ...and 5 more figures