Places: An Image Database for Deep Scene Understanding
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva
TL;DR
The paper introduces Places Database, a 10-million-image, 476-category, scene-centric dataset built to provide broad coverage and diversity for deep learning-based scene understanding. It outlines a four-step construction pipeline combining web image harvesting, crowdsourced labeling, bootstrapped classification, and disambiguation of similar categories, culminating in over 10.6 million labeled exemplars across 434 categories. The authors train and evaluate multiple CNN architectures on Places (including Places205 and Places365), showing that Places-CNN features outperform ImageNet-based features on scene-centric tasks, and introduce a web-demo and cross-dataset deep-feature comparisons that demonstrate Places’ practical utility. They also analyze dataset diversity and visualize network units to illustrate how scene-centric representations differ from object-centric ones. Overall, Places provides a scalable, diverse ecosystem to advance robust, real-world scene understanding and mitigates limitations of prior scene datasets.
Abstract
The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories and attributes, comprising a quasi-exhaustive list of the types of environments encountered in the world. Using state of the art Convolutional Neural Networks, we provide impressive baseline performances at scene classification. With its high-coverage and high-diversity of exemplars, the Places Database offers an ecosystem to guide future progress on currently intractable visual recognition problems.
