Remote Sensing Image Scene Classification: Benchmark and State of the Art
Gong Cheng, Junwei Han, Xiaoqiang Lu
TL;DR
This work surveys remote sensing image scene classification, identifying limitations in existing public datasets (small scale, limited variation, accuracy saturation) and proposing NWPU-RESISC45 as a large-scale benchmark with 31,500 images across 45 classes. It analyzes three families of methods—handcrafted features, unsupervised feature learning, and deep feature learning—showing deep CNN features, especially when fine-tuned, achieve the best performance on the new dataset. The dataset combines broad geographic coverage and substantial intra-class diversity to stress-test data-driven approaches, enabling robust comparisons and progress in the field. The findings highlight the practical impact of large, varied RS datasets for advancing deep learning methods and set the stage for integrating auxiliary data sources in future work.
Abstract
Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research.
