YCB-Ev 1.1: Event-vision dataset for 6DoF object pose estimation
Pavel Rojtberg, Thomas Pöllabauer
TL;DR
The paper presents YCB-Ev, the first real-world event-vision dataset with ground-truth 6DoF poses for 21 YCB objects, enabling evaluation of pose estimation across RGB-D and event modalities and facilitating cross-dataset analysis with YCB-V. Ground-truth poses are generated from RGB data using a robust calibration and tracking pipeline and then transferred to the event camera frame, supplemented by tools for pose interpolation and event alignment. The dataset comprises 21 sequences (13,851 frames; 7m43s of event data), including 12 sequences that mirror the YCB-V/BOP subset, and includes challenging conditions such as rapid motion, occlusions, and frame drops. A bias analysis shows that domain gaps between RGB-based training and event-domain evaluation persist, highlighting the need for event-focused training and annotation, as well as potential synthetic data approaches. Overall, YCB-Ev enables direct evaluation of event-based 6DoF pose estimation and cross-dataset generalization, advancing research at the intersection of neuromorphic sensing and 3D object pose estimation.
Abstract
Our work introduces the YCB-Ev dataset, which contains synchronized RGB-D frames and event data that enables evaluating 6DoF object pose estimation algorithms using these modalities. This dataset provides ground truth 6DoF object poses for the same 21 YCB objects that were used in the YCB-Video (YCB-V) dataset, allowing for cross-dataset algorithm performance evaluation. The dataset consists of 21 synchronized event and RGB-D sequences, totalling 13,851 frames (7 minutes and 43 seconds of event data). Notably, 12 of these sequences feature the same object arrangement as the YCB-V subset used in the BOP challenge. Ground truth poses are generated by detecting objects in the RGB-D frames, interpolating the poses to align with the event timestamps, and then transferring them to the event coordinate frame using extrinsic calibration. Our dataset is the first to provide ground truth 6DoF pose data for event streams. Furthermore, we evaluate the generalization capabilities of two state-of-the-art algorithms, which were pre-trained for the BOP challenge, using our novel YCB-V sequences. The dataset is publicly available at https://github.com/paroj/ycbev.
