Table of Contents
Fetching ...

Places: An Image Database for Deep Scene Understanding

Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva

TL;DR

The paper introduces Places Database, a 10-million-image, 476-category, scene-centric dataset built to provide broad coverage and diversity for deep learning-based scene understanding. It outlines a four-step construction pipeline combining web image harvesting, crowdsourced labeling, bootstrapped classification, and disambiguation of similar categories, culminating in over 10.6 million labeled exemplars across 434 categories. The authors train and evaluate multiple CNN architectures on Places (including Places205 and Places365), showing that Places-CNN features outperform ImageNet-based features on scene-centric tasks, and introduce a web-demo and cross-dataset deep-feature comparisons that demonstrate Places’ practical utility. They also analyze dataset diversity and visualize network units to illustrate how scene-centric representations differ from object-centric ones. Overall, Places provides a scalable, diverse ecosystem to advance robust, real-world scene understanding and mitigates limitations of prior scene datasets.

Abstract

The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories and attributes, comprising a quasi-exhaustive list of the types of environments encountered in the world. Using state of the art Convolutional Neural Networks, we provide impressive baseline performances at scene classification. With its high-coverage and high-diversity of exemplars, the Places Database offers an ecosystem to guide future progress on currently intractable visual recognition problems.

Places: An Image Database for Deep Scene Understanding

TL;DR

The paper introduces Places Database, a 10-million-image, 476-category, scene-centric dataset built to provide broad coverage and diversity for deep learning-based scene understanding. It outlines a four-step construction pipeline combining web image harvesting, crowdsourced labeling, bootstrapped classification, and disambiguation of similar categories, culminating in over 10.6 million labeled exemplars across 434 categories. The authors train and evaluate multiple CNN architectures on Places (including Places205 and Places365), showing that Places-CNN features outperform ImageNet-based features on scene-centric tasks, and introduce a web-demo and cross-dataset deep-feature comparisons that demonstrate Places’ practical utility. They also analyze dataset diversity and visualize network units to illustrate how scene-centric representations differ from object-centric ones. Overall, Places provides a scalable, diverse ecosystem to advance robust, real-world scene understanding and mitigates limitations of prior scene datasets.

Abstract

The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories and attributes, comprising a quasi-exhaustive list of the types of environments encountered in the world. Using state of the art Convolutional Neural Networks, we provide impressive baseline performances at scene classification. With its high-coverage and high-diversity of exemplars, the Places Database offers an ecosystem to guide future progress on currently intractable visual recognition problems.

Paper Structure

This paper contains 17 sections, 1 equation, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Image samples from various categories of the Places Database. The dataset contains three macro-classes: Indoor, Nature, and Urban.
  • Figure 2: Image samples from four scene categories grouped by queries to illustrate the diversity of the dataset. For each query we show 9 annotated images.
  • Figure 3: Annotation interface in the Amazon Mechanical Turk for selecting the correct exemplars of the scene from the downloaded images. The left plot shows the instruction given to the workers in which we define positive and negative examples. The right plot shows the binary selection interface.
  • Figure 4: Annotation interface in Amazon Mechanical Turk for differentiating images from two similar categories. The left plot shows the instruction in which we give several typical examples in each category. The right plot shows the binary selection interface, in which the worker needs to select the shown image into either of the class or none.
  • Figure 5: Boundaries between place categories can be blurry, as some images can be made of a mixture of different components. The images shown in this figure show a soft transition between a field and a forest. Although the extreme images can be easily classified as field and forest scenes, the middle images can be ambiguous.
  • ...and 9 more figures