Table of Contents
Fetching ...

Temporal Event Stereo via Joint Learning with Stereoscopic Flow

Hoonhee Cho, Jae-Young Kang, Kuk-Jin Yoon

TL;DR

This work tackles depth estimation from event cameras by introducing Temporal Event Stereo, which uses a novel stereoscopic flow $\mathcal{SF}$ to propagate information from past frames to the present. The method jointly trains a stereo network and a lightweight stereoscopic-flow network, employing feature and cost-volume warping and an entropy-based fusion, guided by a temporal disparity consistency loss that does not require ground-truth optical flow. The approach achieves state-of-the-art results on MVSEC and DSEC with high efficiency, thanks to reusing past information and a compact flow module, and is validated through extensive ablations and qualitative analyses. Overall, the paper demonstrates that flow-guided temporal aggregation in event-based stereo yields significant gains in accuracy and robustness for dynamic scenes while maintaining practical computation budgets.

Abstract

Event cameras are dynamic vision sensors inspired by the biological retina, characterized by their high dynamic range, high temporal resolution, and low power consumption. These features make them capable of perceiving 3D environments even in extreme conditions. Event data is continuous across the time dimension, which allows a detailed description of each pixel's movements. To fully utilize the temporally dense and continuous nature of event cameras, we propose a novel temporal event stereo, a framework that continuously uses information from previous time steps. This is accomplished through the simultaneous training of an event stereo matching network alongside stereoscopic flow, a new concept that captures all pixel movements from stereo cameras. Since obtaining ground truth for optical flow during training is challenging, we propose a method that uses only disparity maps to train the stereoscopic flow. The performance of event-based stereo matching is enhanced by temporally aggregating information using the flows. We have achieved state-of-the-art performance on the MVSEC and the DSEC datasets. The method is computationally efficient, as it stacks previous information in a cascading manner. The code is available at https://github.com/mickeykang16/TemporalEventStereo.

Temporal Event Stereo via Joint Learning with Stereoscopic Flow

TL;DR

This work tackles depth estimation from event cameras by introducing Temporal Event Stereo, which uses a novel stereoscopic flow to propagate information from past frames to the present. The method jointly trains a stereo network and a lightweight stereoscopic-flow network, employing feature and cost-volume warping and an entropy-based fusion, guided by a temporal disparity consistency loss that does not require ground-truth optical flow. The approach achieves state-of-the-art results on MVSEC and DSEC with high efficiency, thanks to reusing past information and a compact flow module, and is validated through extensive ablations and qualitative analyses. Overall, the paper demonstrates that flow-guided temporal aggregation in event-based stereo yields significant gains in accuracy and robustness for dynamic scenes while maintaining practical computation budgets.

Abstract

Event cameras are dynamic vision sensors inspired by the biological retina, characterized by their high dynamic range, high temporal resolution, and low power consumption. These features make them capable of perceiving 3D environments even in extreme conditions. Event data is continuous across the time dimension, which allows a detailed description of each pixel's movements. To fully utilize the temporally dense and continuous nature of event cameras, we propose a novel temporal event stereo, a framework that continuously uses information from previous time steps. This is accomplished through the simultaneous training of an event stereo matching network alongside stereoscopic flow, a new concept that captures all pixel movements from stereo cameras. Since obtaining ground truth for optical flow during training is challenging, we propose a method that uses only disparity maps to train the stereoscopic flow. The performance of event-based stereo matching is enhanced by temporally aggregating information using the flows. We have achieved state-of-the-art performance on the MVSEC and the DSEC datasets. The method is computationally efficient, as it stacks previous information in a cascading manner. The code is available at https://github.com/mickeykang16/TemporalEventStereo.
Paper Structure (28 sections, 10 equations, 8 figures, 10 tables)

This paper contains 28 sections, 10 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Overview of the temporal event stereo. We can accurately and efficiently estimate dense disparity by propagating previously computed information to the present through stereoscopic flow ($\mathcal{SF}$).
  • Figure 2: Architecture overview of the proposed temporal event stereo. In each time step, information from the preceding moment is warped via stereoscopic flow and fused with the current information, boosting the intermediate representations such as feature map and cost volume. Stereoscopic flow training does not require the ground truth optical flow but receives a supervision signal from ground truth disparity.
  • Figure 3: Cost Volume Warping. Cost volume warping with 3-dimensional flow $\{\Delta d, \Delta x^L, \Delta y\}$ (left). Relation between disparity flow and stereoscopic flow (right).
  • Figure 4: Qualitative results on the Indoor Flying dataset of MVSEC. From top to bottom, the rows display $\#245$ from sequence 1, $\#591$ from sequence 1, $\#200$ from sequence 3, and $\#1585$ from sequence 3, respectively. The author reproduced qualitative results with publicly available code zhang2022discretetulyakov2019learning.
  • Figure 5: Qualitative results on the DSEC dataset. We use a pre-trained model provided by the author for Se-CFF nam2022stereo and train DTC-SPADE zhang2022discrete using public code.
  • ...and 3 more figures