JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

Duy-Tho Le; Chenhui Gou; Stavya Datta; Hengcan Shi; Ian Reid; Jianfei Cai; Hamid Rezatofighi

JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

Duy-Tho Le, Chenhui Gou, Stavya Datta, Hengcan Shi, Ian Reid, Jianfei Cai, Hamid Rezatofighi

TL;DR

JRDB-PanoTrack presents a large-scale open-world panoptic segmentation and tracking dataset tailored for robotic perception in crowded human environments. It extends the JRDB dataset with synchronized 2D and 3D modalities, 428K panoptic masks, 27K tracking labels, and 72 classes (43 known for closed-world evaluation and 28 unknown for open-world testing), along with new OSPA-based metrics for panoptic segmentation and tracking. The authors provide four benchmarks (CW/OW panoptic segmentation and tracking) and demonstrate that current state-of-the-art methods face substantial challenges in this setting, especially in open-world and multi-modal contexts. The work highlights the dataset’s potential to drive development of robust, generalizable perception systems for real-world robotic deployments and motivates future research in open-world evaluation and multi-modal fusion.

Abstract

Autonomous robot systems have attracted increasing research attention in recent years, where environment understanding is a crucial step for robot navigation, human-robot interaction, and decision. Real-world robot systems usually collect visual data from multiple sensors and are required to recognize numerous objects and their movements in complex human-crowded settings. Traditional benchmarks, with their reliance on single sensors and limited object classes and scenarios, fail to provide the comprehensive environmental understanding robots need for accurate navigation, interaction, and decision-making. As an extension of JRDB dataset, we unveil JRDB-PanoTrack, a novel open-world panoptic segmentation and tracking benchmark, towards more comprehensive environmental perception. JRDB-PanoTrack includes (1) various data involving indoor and outdoor crowded scenes, as well as comprehensive 2D and 3D synchronized data modalities; (2) high-quality 2D spatial panoptic segmentation and temporal tracking annotations, with additional 3D label projections for further spatial understanding; (3) diverse object classes for closed- and open-world recognition benchmarks, with OSPA-based metrics for evaluation. Extensive evaluation of leading methods shows significant challenges posed by our dataset.

JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

TL;DR

Abstract

Paper Structure (10 sections, 1 equation, 5 figures, 13 tables)

This paper contains 10 sections, 1 equation, 5 figures, 13 tables.

Introduction
Related Work
The JRDB-PanoTrack Dataset
Dataset and Statistics
Benchmark and Metrics
Experiments
Panoptic Segmentation
Panoptic Tracking
3D Panoptic Segmentation & Tracking
Conclusion

Figures (5)

Figure 1: A panoramic frame (bottom) and panoptic annotation (top) from our JRDB-PanoTrack dataset. Our dataset features multi-label panoptic annotations, highlighted by the striped areas where multiple objects coexist. JRDB-PanoTrack also provides consistent tracking IDs for all thing classes across long periods of occlusion.
Figure 2: Distribution of object masks of thing (brown) and stuff (green) classes in JRDB-PanoTrack train and test sets, where x and y-axis indicate the class names and mask counts, respectively. Best viewed in color and zoomed in.
Figure 3: Word cloud of the most frequent classes seen through glass in JRDB-PanoTrack, with the size of the word proportional to the frequency of the class.
Figure 4: Analysis of Track length distribution (top) and Number of masks per frame (bottom) in the JRDB-PanoTrack dataset. Best viewed in color.
Figure 5: The count (top) and percentage (bottom) of Small, Medium and Large masks in JRDB-PanoTrack training and testing sets. Small and Large masks are the masks $\leq 32^2$ and $\leq 96^2$ pixels, and the sizes of Medium masks are in between. Image size is 752x480 (W x H).

JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

TL;DR

Abstract

JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (5)