Traffic Scene Parsing through the TSP6K Dataset
Peng-Tao Jiang, Yuqi Yang, Yang Cao, Qibin Hou, Ming-Ming Cheng, Chunhua Shen
TL;DR
Traffic monitoring scene parsing is hampered by a lack of dedicated annotated data and domain gaps with driving datasets. The paper introduces TSP6K, a 6,000-image dataset with detailed semantic and instance annotations tailored to monitoring scenes, and evaluates leading scene parsing, instance segmentation, and UDA methods on it. To address parsing challenges in high-resolution, highly crowded traffic scenes, it proposes a detail refining decoder (DRD) that uses region tokens and cross-attention to refine region-specific details, achieving 75.8% mIoU and 58.4% iIoU on the validation set. The dataset and DRD provide a new benchmark and practical tool for traffic flow analysis and domain-adaptation research in traffic monitoring contexts.
Abstract
Traffic scene perception in computer vision is a critically important task to achieve intelligent cities. To date, most existing datasets focus on autonomous driving scenes. We observe that the models trained on those driving datasets often yield unsatisfactory results on traffic monitoring scenes. However, little effort has been put into improving the traffic monitoring scene understanding, mainly due to the lack of specific datasets. To fill this gap, we introduce a specialized traffic monitoring dataset, termed TSP6K, containing images from the traffic monitoring scenario, with high-quality pixel-level and instance-level annotations. The TSP6K dataset captures more crowded traffic scenes with several times more traffic participants than the existing driving scenes. We perform a detailed analysis of the dataset and comprehensively evaluate previous popular scene parsing methods, instance segmentation methods and unsupervised domain adaption methods. Furthermore, considering the vast difference in instance sizes, we propose a detail refining decoder for scene parsing, which recovers the details of different semantic regions in traffic scenes owing to the proposed TSP6K dataset. Experiments show its effectiveness in parsing the traffic monitoring scenes. Code and dataset are available at https://github.com/PengtaoJiang/TSP6K.
