Table of Contents
Fetching ...

EventHub: Data Factory for Generalizable Event-Based Stereo Networks without Active Sensors

Luca Bartolomei, Fabio Tosi, Matteo Poggi, Stefano Mattoccia, Guillermo Gallego

Abstract

We propose EventHub, a novel framework for training deep-event stereo networks without ground truth annotations from costly active sensors, relying instead on standard color images. From these images, we derive either proxy annotations and proxy events through state-of-the-art novel view synthesis techniques, or simply proxy annotations when images are already paired with event data. Using the training set generated by our data factory, we repurpose state-of-the-art stereo models from RGB literature to process event data, obtaining new event stereo models with unprecedented generalization capabilities. Experiments on widely used event stereo datasets support the effectiveness of EventHub and show how the same data distillation mechanism can improve the accuracy of RGB stereo foundation models in challenging conditions such as nighttime scenes.

EventHub: Data Factory for Generalizable Event-Based Stereo Networks without Active Sensors

Abstract

We propose EventHub, a novel framework for training deep-event stereo networks without ground truth annotations from costly active sensors, relying instead on standard color images. From these images, we derive either proxy annotations and proxy events through state-of-the-art novel view synthesis techniques, or simply proxy annotations when images are already paired with event data. Using the training set generated by our data factory, we repurpose state-of-the-art stereo models from RGB literature to process event data, obtaining new event stereo models with unprecedented generalization capabilities. Experiments on widely used event stereo datasets support the effectiveness of EventHub and show how the same data distillation mechanism can improve the accuracy of RGB stereo foundation models in challenging conditions such as nighttime scenes.

Paper Structure

This paper contains 26 sections, 14 equations, 16 figures, 10 tables.

Figures (16)

  • Figure 1: EventHub: LiDAR-free proxy data for robust event stereo. Our factory generates training data from multiple sources tosi2023nerfyeshwanth2023scannetgehrig2021dsec (top), allowing our E-FoundationStereo to match EMatch zhang2025ematch in-domain gehrig2021dsec and outperform it in generalization chaney2023m3ed (bottom).
  • Figure 2: Limitations of LiDAR-supervised real-world datasets. Despite their popularity gehrig2021dsecchaney2023m3edzhu2018multivehicle, LiDAR annotations remain sparse (A), poorly capture dynamic scenes (B–C), are prone to reprojection errors (D), and struggle on transparent or reflective surfaces (E).
  • Figure 3: Framework Overview: We obtain training data through two complementary approaches: (i) Event Data Factory: SVRaster sun2025sparse generates synthetic event stereo pairs and depth labels from sparse RGB images via virtual camera trajectories (left); (ii) Stereo Cross-Modal Distillation: existing RGB stereo models produce proxy depth labels for real event data in calibrated RGB-Event stereo setups (top right). (iii) Both data sources are combined in EventHub to train/adapt event stereo networks (bottom right).
  • Figure 4: Qualitative examples of events and proxy annotations by EventHub. From top to bottom, examples obtained from NeRF-Stereo tosi2023nerf, ScanNet++ yeshwanth2023scannet through novel view synthesis, and from DSEC gehrig2021dsec through cross-modal distillation.
  • Figure 5: Qualitative results on DSEC dataset gehrig2021dsec. Predictions by E-FoundationStereo trained according to different protocols.
  • ...and 11 more figures