Table of Contents
Fetching ...

A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered Environment

Xiaoqian Huang, Kachole Sanket, Abdulla Ayyad, Fariborz Baghaei Naeini, Dimitrios Makris, Yahya Zweiri

TL;DR

This work introduces the Event-based Segmentation Dataset (ESD), a high-quality 3D spatial–temporal benchmark for object segmentation in indoor clutter using stereo neuromorphic cameras and an RGBD sensor. It provides 145 sequences with 14,166 annotated RGB frames and over 20 million events per camera, with depth-aligned event labels and manual RGB masks enabling both instance and semantic segmentation. Comprehensive evaluations across varying trajectories, speeds, lighting, distances, and occlusions reveal that event data substantially boosts segmentation, especially under challenging conditions like motion blur or low illumination, and that cross-modal fusion with RGB improves performance for known objects but struggles with unseen ones. By releasing ESD, the authors offer a challenging, labeled, multi-modal dataset to advance neuromorphic segmentation for robotic perception and manipulation in unstructured indoor environments.

Abstract

Taking advantage of an event-based camera, the issues of motion blur, low dynamic range and low time sampling of standard cameras can all be addressed. However, there is a lack of event-based datasets dedicated to the benchmarking of segmentation algorithms, especially those that provide depth information which is critical for segmentation in occluded scenes. This paper proposes a new Event-based Segmentation Dataset (ESD), a high-quality 3D spatial and temporal dataset for object segmentation in an indoor cluttered environment. Our proposed dataset ESD comprises 145 sequences with 14,166 RGB frames that are manually annotated with instance masks. Overall 21.88 million and 20.80 million events from two event-based cameras in a stereo-graphic configuration are collected, respectively. To the best of our knowledge, this densely annotated and 3D spatial-temporal event-based segmentation benchmark of tabletop objects is the first of its kind. By releasing ESD, we expect to provide the community with a challenging segmentation benchmark with high quality.

A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered Environment

TL;DR

This work introduces the Event-based Segmentation Dataset (ESD), a high-quality 3D spatial–temporal benchmark for object segmentation in indoor clutter using stereo neuromorphic cameras and an RGBD sensor. It provides 145 sequences with 14,166 annotated RGB frames and over 20 million events per camera, with depth-aligned event labels and manual RGB masks enabling both instance and semantic segmentation. Comprehensive evaluations across varying trajectories, speeds, lighting, distances, and occlusions reveal that event data substantially boosts segmentation, especially under challenging conditions like motion blur or low illumination, and that cross-modal fusion with RGB improves performance for known objects but struggles with unseen ones. By releasing ESD, the authors offer a challenging, labeled, multi-modal dataset to advance neuromorphic segmentation for robotic perception and manipulation in unstructured indoor environments.

Abstract

Taking advantage of an event-based camera, the issues of motion blur, low dynamic range and low time sampling of standard cameras can all be addressed. However, there is a lack of event-based datasets dedicated to the benchmarking of segmentation algorithms, especially those that provide depth information which is critical for segmentation in occluded scenes. This paper proposes a new Event-based Segmentation Dataset (ESD), a high-quality 3D spatial and temporal dataset for object segmentation in an indoor cluttered environment. Our proposed dataset ESD comprises 145 sequences with 14,166 RGB frames that are manually annotated with instance masks. Overall 21.88 million and 20.80 million events from two event-based cameras in a stereo-graphic configuration are collected, respectively. To the best of our knowledge, this densely annotated and 3D spatial-temporal event-based segmentation benchmark of tabletop objects is the first of its kind. By releasing ESD, we expect to provide the community with a challenging segmentation benchmark with high quality.
Paper Structure (5 sections, 6 equations, 12 figures, 3 tables, 1 algorithm)

This paper contains 5 sections, 6 equations, 12 figures, 3 tables, 1 algorithm.

Figures (12)

  • Figure 1: Hardware setup. Experimental hardware setup (left-side figure): three cameras are fixed on the end-effector of the UR10's manipulator. Camera configuration (right-side figure): The RGBD camera Intel D435 is placed in the middle, and two event-based cameras Davis 346c are mounted on the left and right sides with a tiled angle of 5 degrees towards the middle.
  • Figure 2: Designed moving trajectories in $x-y-r$ space, where $x-y$ indicates the plane that cameras move on, and the rotation is denoted in $r$ axis.
  • Figure 3: Two steps of labeling blurred images: initial annotation and re-annotation. If wrong labels show in the event frame, the second-round labeling of the RGB mask will be triggered according to the initial annotated events.
  • Figure 4: Principle of mapping the interval of events on the RGB frame coordinate for annotation
  • Figure 5: Example of the ESD-1 in terms of the number of objects attributes, under the condition of 0.15 moving speed, normal light condition, linear movement, and 0.82 height. Different colors in the RGB ground truth and annotated event mask mean different labels. Better view in color.
  • ...and 7 more figures