Table of Contents
Fetching ...

Traffic Scene Parsing through the TSP6K Dataset

Peng-Tao Jiang, Yuqi Yang, Yang Cao, Qibin Hou, Ming-Ming Cheng, Chunhua Shen

TL;DR

Traffic monitoring scene parsing is hampered by a lack of dedicated annotated data and domain gaps with driving datasets. The paper introduces TSP6K, a 6,000-image dataset with detailed semantic and instance annotations tailored to monitoring scenes, and evaluates leading scene parsing, instance segmentation, and UDA methods on it. To address parsing challenges in high-resolution, highly crowded traffic scenes, it proposes a detail refining decoder (DRD) that uses region tokens and cross-attention to refine region-specific details, achieving 75.8% mIoU and 58.4% iIoU on the validation set. The dataset and DRD provide a new benchmark and practical tool for traffic flow analysis and domain-adaptation research in traffic monitoring contexts.

Abstract

Traffic scene perception in computer vision is a critically important task to achieve intelligent cities. To date, most existing datasets focus on autonomous driving scenes. We observe that the models trained on those driving datasets often yield unsatisfactory results on traffic monitoring scenes. However, little effort has been put into improving the traffic monitoring scene understanding, mainly due to the lack of specific datasets. To fill this gap, we introduce a specialized traffic monitoring dataset, termed TSP6K, containing images from the traffic monitoring scenario, with high-quality pixel-level and instance-level annotations. The TSP6K dataset captures more crowded traffic scenes with several times more traffic participants than the existing driving scenes. We perform a detailed analysis of the dataset and comprehensively evaluate previous popular scene parsing methods, instance segmentation methods and unsupervised domain adaption methods. Furthermore, considering the vast difference in instance sizes, we propose a detail refining decoder for scene parsing, which recovers the details of different semantic regions in traffic scenes owing to the proposed TSP6K dataset. Experiments show its effectiveness in parsing the traffic monitoring scenes. Code and dataset are available at https://github.com/PengtaoJiang/TSP6K.

Traffic Scene Parsing through the TSP6K Dataset

TL;DR

Traffic monitoring scene parsing is hampered by a lack of dedicated annotated data and domain gaps with driving datasets. The paper introduces TSP6K, a 6,000-image dataset with detailed semantic and instance annotations tailored to monitoring scenes, and evaluates leading scene parsing, instance segmentation, and UDA methods on it. To address parsing challenges in high-resolution, highly crowded traffic scenes, it proposes a detail refining decoder (DRD) that uses region tokens and cross-attention to refine region-specific details, achieving 75.8% mIoU and 58.4% iIoU on the validation set. The dataset and DRD provide a new benchmark and practical tool for traffic flow analysis and domain-adaptation research in traffic monitoring contexts.

Abstract

Traffic scene perception in computer vision is a critically important task to achieve intelligent cities. To date, most existing datasets focus on autonomous driving scenes. We observe that the models trained on those driving datasets often yield unsatisfactory results on traffic monitoring scenes. However, little effort has been put into improving the traffic monitoring scene understanding, mainly due to the lack of specific datasets. To fill this gap, we introduce a specialized traffic monitoring dataset, termed TSP6K, containing images from the traffic monitoring scenario, with high-quality pixel-level and instance-level annotations. The TSP6K dataset captures more crowded traffic scenes with several times more traffic participants than the existing driving scenes. We perform a detailed analysis of the dataset and comprehensively evaluate previous popular scene parsing methods, instance segmentation methods and unsupervised domain adaption methods. Furthermore, considering the vast difference in instance sizes, we propose a detail refining decoder for scene parsing, which recovers the details of different semantic regions in traffic scenes owing to the proposed TSP6K dataset. Experiments show its effectiveness in parsing the traffic monitoring scenes. Code and dataset are available at https://github.com/PengtaoJiang/TSP6K.
Paper Structure (24 sections, 6 equations, 7 figures, 6 tables)

This paper contains 24 sections, 6 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Examples are randomly picked from the TSP6K dataset. Each image is associated with its corresponding semantic label and instance label. We have masked the vehicle plates for privacy protection.
  • Figure 2: (a) Class and scene information of the TSP6K dataset. (b) The geographic distribution of the scene and image.
  • Figure 3: Data analysis of the TSP6K dataset. (a) The distribution of the number of instances in each image. (b) The distribution of the instance sizes. (c) The number of instances for each category.
  • Figure 4: Pipeline of the detail refining decoder. Our decoder contains two parts. The first part is similar to the decoder presented in DeeplabV3+ chen2018encoder. Differently, we use the feature maps from the third stage ($\times8$ downsampling compared to the input) to fuse the feature maps from ASPP. The second part is the proposed region refining module.
  • Figure 5: Visualizations of the attention map corresponding to each token. We randomly select several tokens for visualization. One can see that the visualizations associated with different region tokens focus on different semantic regions. These region tokens can help our method better process the region details.
  • ...and 2 more figures