The Cityscapes Dataset for Semantic Urban Scene Understanding

Marius Cordts; Mohamed Omran; Sebastian Ramos; Timo Rehfeld; Markus Enzweiler; Rodrigo Benenson; Uwe Franke; Stefan Roth; Bernt Schiele

The Cityscapes Dataset for Semantic Urban Scene Understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele

TL;DR

Cityscapes addresses the need for large-scale, diverse urban-scene data by introducing a benchmark with dense pixel-level and instance-level annotations across 50 cities, complemented by stereo depth information. The authors quantify dataset characteristics, provide robust evaluation metrics (IoU, iIoU, AP), and conduct extensive baselines and cross-dataset analyses to reveal how urban scenes differ from generic datasets. Key contributions include the largest richly annotated urban dataset to date, a rigorous evaluation framework for pixel- and instance-level tasks, and insights into how coarse labels, downsampling, and proposal quality impact performance. The work underscores the importance of high-resolution, variable-condition data for advancing semantic and instance-level understanding in real-world driving scenarios, and it sets the stage for future dataset expansions and method development.

Abstract

Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations; 20000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.

The Cityscapes Dataset for Semantic Urban Scene Understanding

TL;DR

Abstract

The Cityscapes Dataset for Semantic Urban Scene Understanding

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)