Event-based Camera Simulation using Monte Carlo Path Tracing with Adaptive Denoising

Yuta Tsuji; Tatsuya Yatagawa; Hiroyuki Kubo; Shigeo Morishima

Event-based Camera Simulation using Monte Carlo Path Tracing with Adaptive Denoising

Yuta Tsuji, Tatsuya Yatagawa, Hiroyuki Kubo, Shigeo Morishima

TL;DR

This work addresses rendering event-based video from noisy frames produced by physics-based Monte Carlo path tracing, where direct frame-by-frame denoising is prohibitively expensive. The authors propose a thresholded weighted local regression that detects brightness changes and triggers events only where likely, using a residual-based criterion with $C^2$ threshold, yielding robust performance on noisy data. They demonstrate that the method achieves comparable or better event-detection quality to a heavier WLR+ESIM baseline while reducing the number of regression evaluations to about $28\%$ of pixels and reducing runtime (e.g., from ~4.5 to ~3 minutes) on 240-frame sequences. This enables efficient simulation of event-based vision in physically-based rendering pipelines and can be extended with non-linear regression or deeper denoising for further improvements.

Abstract

This paper presents an algorithm to obtain an event-based video from noisy frames given by physics-based Monte Carlo path tracing over a synthetic 3D scene. Given the nature of dynamic vision sensor (DVS), rendering event-based video can be viewed as a process of detecting the changes from noisy brightness values. We extend a denoising method based on a weighted local regression (WLR) to detect the brightness changes rather than applying denoising to every pixel. Specifically, we derive a threshold to determine the likelihood of event occurrence and reduce the number of times to perform the regression. Our method is robust to noisy video frames obtained from a few path-traced samples. Despite its efficiency, our method performs comparably to or even better than an approach that exhaustively denoises every frame.

Event-based Camera Simulation using Monte Carlo Path Tracing with Adaptive Denoising

TL;DR

threshold, yielding robust performance on noisy data. They demonstrate that the method achieves comparable or better event-detection quality to a heavier WLR+ESIM baseline while reducing the number of regression evaluations to about

of pixels and reducing runtime (e.g., from ~4.5 to ~3 minutes) on 240-frame sequences. This enables efficient simulation of event-based vision in physically-based rendering pipelines and can be extended with non-linear regression or deeper denoising for further improvements.

Abstract

Paper Structure (8 sections, 1 theorem, 9 equations, 6 figures, 5 tables)

This paper contains 8 sections, 1 theorem, 9 equations, 6 figures, 5 tables.

Introduction
Event-based cameras
Related work
Efficient Event-based Video Rendering
Background: weighted local regression for denoising
Reduced event detection by weighted local regression
Experiments
Conclusion

Key Result

Proposition 1

Assume that an event is detected when $\Delta L = p C$. Then the event can be detected by thresholding the difference of the residues $\Delta \mathcal{R} = \abs{ \hat{\mathcal{R}} - \mathcal{R} }$ with $C^2$.

Figures (6)

Figure 1: An illustration for our event detection method using weighted local regression. For a frame where the last event occurred, we regress a hyper-plane (i.e., a straight line in this figure) for a set of features $\{ \vb{x}^i \}$ centered by $\vb{x}^c$. Then, for a new frame, we translate the hyper-plane to go across a new centroid feature $\hat{\vb{x}}^c$ and calculate the weighted mean squared distance between the hyper-plane and the feature vectors at the time of the last event.
Figure 2: Visual comparison of event-based video frames. In addition to the 1st, 100th, and 200th frames shown here, the full-length videos are available on our project page.
Figure 3: Comparison of precision, recall, and F1 scores of event detection. The distance threshold (i.e., $\tau = 5.0e-4$) used to calculate the values in \ref{['tab:quant-comparison']} is depicted with a broken vertical line. Best viewed on screen.
Figure A3: The effect of event detection threshold $C$ for the resulting event-based video (100th frame). As shown in this figure, our method obtains the results that are most similar to those of reference, while WLR+ESIM suffers from dappled artifacts due to the inconsistency of the WLR models of successive frames. The corresponding evaluation scores for these results are shown in \ref{['tab:effect-threshold']}.
Figure A1: Visual comparison of event-based videos for "Living room" (left) and "Two boxes" (right) scenes. The 1st, 100th, and 200th frames are shown from left to right. The full-length videos for these results are available on our project page: https://github.com/0V/ESIM-AD.git. Although the results for the "Living room" scene reveal that our method is robust to the temporal incoherency of noisy pixel values, the WLR+ESIM significantly suffers from that problem. In contrast, both our method and WLR+ESIM have a limitation in that they tend to detect wrong events due to antialiasing at the objects' occluding contours (see the results for the "Two boxes" scene).
...and 1 more figures

Theorems & Definitions (2)

Proposition 1
proof

Event-based Camera Simulation using Monte Carlo Path Tracing with Adaptive Denoising

TL;DR

Abstract

Event-based Camera Simulation using Monte Carlo Path Tracing with Adaptive Denoising

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (2)