Table of Contents
Fetching ...

D$^2$-City: A Large-Scale Dashcam Video Dataset of Diverse Traffic Scenarios

Zhengping Che, Guangyu Li, Tracy Li, Bo Jiang, Xuefeng Shi, Xinsheng Zhang, Ying Lu, Guobin Wu, Yan Liu, Jieping Ye

TL;DR

D^2-City addresses the need for a large-scale, richly annotated dashcam dataset tailored to real-world driving in China. It collects over 11k videos from hundreds of drivers, with 1,000 videos densely annotated for 12 object classes in both detection and tracking, plus keyframe annotations for the rest and a detection-interpolation task. The dataset emphasizes diversity across cities, weather, road types, and traffic scenarios, while implementing privacy safeguards such as license-plate and face blurring. It provides training, validation, and test splits and aims to spur new methods in perception and intelligent driving, including large-scale detection interpolation. The work positions D^2-City as a valuable benchmark for detection and tracking in diverse, real-world driving conditions and outlines plans to broaden coverage and annotations in future releases.

Abstract

Driving datasets accelerate the development of intelligent driving and related computer vision technologies, while substantial and detailed annotations serve as fuels and powers to boost the efficacy of such datasets to improve learning-based models. We propose D$^2$-City, a large-scale comprehensive collection of dashcam videos collected by vehicles on DiDi's platform. D$^2$-City contains more than 10000 video clips which deeply reflect the diversity and complexity of real-world traffic scenarios in China. We also provide bounding boxes and tracking annotations of 12 classes of objects in all frames of 1000 videos and detection annotations on keyframes for the remainder of the videos. Compared with existing datasets, D$^2$-City features data in varying weather, road, and traffic conditions and a huge amount of elaborate detection and tracking annotations. By bringing a diverse set of challenging cases to the community, we expect the D$^2$-City dataset will advance the perception and related areas of intelligent driving.

D$^2$-City: A Large-Scale Dashcam Video Dataset of Diverse Traffic Scenarios

TL;DR

D^2-City addresses the need for a large-scale, richly annotated dashcam dataset tailored to real-world driving in China. It collects over 11k videos from hundreds of drivers, with 1,000 videos densely annotated for 12 object classes in both detection and tracking, plus keyframe annotations for the rest and a detection-interpolation task. The dataset emphasizes diversity across cities, weather, road types, and traffic scenarios, while implementing privacy safeguards such as license-plate and face blurring. It provides training, validation, and test splits and aims to spur new methods in perception and intelligent driving, including large-scale detection interpolation. The work positions D^2-City as a valuable benchmark for detection and tracking in diverse, real-world driving conditions and outlines plans to broaden coverage and annotations in future releases.

Abstract

Driving datasets accelerate the development of intelligent driving and related computer vision technologies, while substantial and detailed annotations serve as fuels and powers to boost the efficacy of such datasets to improve learning-based models. We propose D-City, a large-scale comprehensive collection of dashcam videos collected by vehicles on DiDi's platform. D-City contains more than 10000 video clips which deeply reflect the diversity and complexity of real-world traffic scenarios in China. We also provide bounding boxes and tracking annotations of 12 classes of objects in all frames of 1000 videos and detection annotations on keyframes for the remainder of the videos. Compared with existing datasets, D-City features data in varying weather, road, and traffic conditions and a huge amount of elaborate detection and tracking annotations. By bringing a diverse set of challenging cases to the community, we expect the D-City dataset will advance the perception and related areas of intelligent driving.

Paper Structure

This paper contains 19 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: A list of sample video frames from the D$^2$-City data collection. D$^2$-City covers diverse real-world traffic scenarios in China, such as traffic congestions, crowded crossroads, narrow alleys, road constructions, and scenes with large volumes of non-motor vehicles and pedestrians.
  • Figure 2: Distribution of videos in different time of day.
  • Figure 3: Statistics of the number of lanes in the driving direction in all videos.
  • Figure 4: Statistics of the numbers of intersections the ego-vehicles passed in all videos.
  • Figure 5: Distributions of the average, maximum, and minimum speed of the ego-vehicles in all videos.
  • ...and 4 more figures