JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments
Duy-Tho Le, Chenhui Gou, Stavya Datta, Hengcan Shi, Ian Reid, Jianfei Cai, Hamid Rezatofighi
TL;DR
JRDB-PanoTrack presents a large-scale open-world panoptic segmentation and tracking dataset tailored for robotic perception in crowded human environments. It extends the JRDB dataset with synchronized 2D and 3D modalities, 428K panoptic masks, 27K tracking labels, and 72 classes (43 known for closed-world evaluation and 28 unknown for open-world testing), along with new OSPA-based metrics for panoptic segmentation and tracking. The authors provide four benchmarks (CW/OW panoptic segmentation and tracking) and demonstrate that current state-of-the-art methods face substantial challenges in this setting, especially in open-world and multi-modal contexts. The work highlights the dataset’s potential to drive development of robust, generalizable perception systems for real-world robotic deployments and motivates future research in open-world evaluation and multi-modal fusion.
Abstract
Autonomous robot systems have attracted increasing research attention in recent years, where environment understanding is a crucial step for robot navigation, human-robot interaction, and decision. Real-world robot systems usually collect visual data from multiple sensors and are required to recognize numerous objects and their movements in complex human-crowded settings. Traditional benchmarks, with their reliance on single sensors and limited object classes and scenarios, fail to provide the comprehensive environmental understanding robots need for accurate navigation, interaction, and decision-making. As an extension of JRDB dataset, we unveil JRDB-PanoTrack, a novel open-world panoptic segmentation and tracking benchmark, towards more comprehensive environmental perception. JRDB-PanoTrack includes (1) various data involving indoor and outdoor crowded scenes, as well as comprehensive 2D and 3D synchronized data modalities; (2) high-quality 2D spatial panoptic segmentation and temporal tracking annotations, with additional 3D label projections for further spatial understanding; (3) diverse object classes for closed- and open-world recognition benchmarks, with OSPA-based metrics for evaluation. Extensive evaluation of leading methods shows significant challenges posed by our dataset.
