LiDAR-Event Stereo Fusion with Hallucinations
Luca Bartolomei, Matteo Poggi, Andrea Conti, Stefano Mattoccia
TL;DR
The paper tackles depth estimation with event cameras, which suffer from sparse and semi-dense data in textureless or motionless regions. It introduces LiDAR-assisted fusion through two hallucination-based mechanisms, Virtual Stack Hallucination (VSH) and Back-in-Time Hallucination (BTH), to inject depth hints either into stacked event representations or directly into event histories, preserving the microsecond resolution of events. Across DSEC and M3ED datasets, VSH and BTH consistently outperform RGB-LiDAR fusion baselines, with BTH often achieving the best 1PE and MAE, and demonstrate robustness to non-synchronized LiDAR data. The work advances practical, high-precision depth estimation for fast-motion scenarios by leveraging sparse depth cues without sacrificing temporal fidelity, and it provides a general framework applicable to multiple stacked representations.
Abstract
Event stereo matching is an emerging technique to estimate depth from neuromorphic cameras; however, events are unlikely to trigger in the absence of motion or the presence of large, untextured regions, making the correspondence problem extremely challenging. Purposely, we propose integrating a stereo event camera with a fixed-frequency active sensor -- e.g., a LiDAR -- collecting sparse depth measurements, overcoming the aforementioned limitations. Such depth hints are used by hallucinating -- i.e., inserting fictitious events -- the stacks or raw input streams, compensating for the lack of information in the absence of brightness changes. Our techniques are general, can be adapted to any structured representation to stack events and outperform state-of-the-art fusion methods applied to event-based stereo.
