Temporal Event Stereo via Joint Learning with Stereoscopic Flow
Hoonhee Cho, Jae-Young Kang, Kuk-Jin Yoon
TL;DR
This work tackles depth estimation from event cameras by introducing Temporal Event Stereo, which uses a novel stereoscopic flow $\mathcal{SF}$ to propagate information from past frames to the present. The method jointly trains a stereo network and a lightweight stereoscopic-flow network, employing feature and cost-volume warping and an entropy-based fusion, guided by a temporal disparity consistency loss that does not require ground-truth optical flow. The approach achieves state-of-the-art results on MVSEC and DSEC with high efficiency, thanks to reusing past information and a compact flow module, and is validated through extensive ablations and qualitative analyses. Overall, the paper demonstrates that flow-guided temporal aggregation in event-based stereo yields significant gains in accuracy and robustness for dynamic scenes while maintaining practical computation budgets.
Abstract
Event cameras are dynamic vision sensors inspired by the biological retina, characterized by their high dynamic range, high temporal resolution, and low power consumption. These features make them capable of perceiving 3D environments even in extreme conditions. Event data is continuous across the time dimension, which allows a detailed description of each pixel's movements. To fully utilize the temporally dense and continuous nature of event cameras, we propose a novel temporal event stereo, a framework that continuously uses information from previous time steps. This is accomplished through the simultaneous training of an event stereo matching network alongside stereoscopic flow, a new concept that captures all pixel movements from stereo cameras. Since obtaining ground truth for optical flow during training is challenging, we propose a method that uses only disparity maps to train the stereoscopic flow. The performance of event-based stereo matching is enhanced by temporally aggregating information using the flows. We have achieved state-of-the-art performance on the MVSEC and the DSEC datasets. The method is computationally efficient, as it stacks previous information in a cascading manner. The code is available at https://github.com/mickeykang16/TemporalEventStereo.
