Table of Contents
Fetching ...

Remote Sensing Image Scene Classification: Benchmark and State of the Art

Gong Cheng, Junwei Han, Xiaoqiang Lu

TL;DR

This work surveys remote sensing image scene classification, identifying limitations in existing public datasets (small scale, limited variation, accuracy saturation) and proposing NWPU-RESISC45 as a large-scale benchmark with 31,500 images across 45 classes. It analyzes three families of methods—handcrafted features, unsupervised feature learning, and deep feature learning—showing deep CNN features, especially when fine-tuned, achieve the best performance on the new dataset. The dataset combines broad geographic coverage and substantial intra-class diversity to stress-test data-driven approaches, enabling robust comparisons and progress in the field. The findings highlight the practical impact of large, varied RS datasets for advancing deep learning methods and set the stage for integrating auxiliary data sources in future work.

Abstract

Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research.

Remote Sensing Image Scene Classification: Benchmark and State of the Art

TL;DR

This work surveys remote sensing image scene classification, identifying limitations in existing public datasets (small scale, limited variation, accuracy saturation) and proposing NWPU-RESISC45 as a large-scale benchmark with 31,500 images across 45 classes. It analyzes three families of methods—handcrafted features, unsupervised feature learning, and deep feature learning—showing deep CNN features, especially when fine-tuned, achieve the best performance on the new dataset. The dataset combines broad geographic coverage and substantial intra-class diversity to stress-test data-driven approaches, enabling robust comparisons and progress in the field. The findings highlight the practical impact of large, varied RS datasets for advancing deep learning methods and set the stage for integrating auxiliary data sources in future work.

Abstract

Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research.

Paper Structure

This paper contains 21 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Eight scene images from the popular UC Merced land use dataset: (a) dense residential, (b) medium residential, (c) freeway, (d) runway, (e) river, (f) golf course, (g) intersection, and (h) overpass.
  • Figure 2: Some example images from the proposed NWPU-RESISC45 dataset, which was carefully designed under all kinds of weathers, seasons, illumination conditions, imaging conditions, and scales. Accordingly, these images generally have rich variations in translation, viewpoint, object pose and appearance, spatial resolution, illumination, background, and occlusion, etc.
  • Figure 3: Overall accuracies of the methods of BoVW, BoVW+SPM and LLC with the visual codebook sizes being set to be$500,1000,2000$, and 5000 , respectively, under the training ratios of (a) $10 \%$ and (b) $20 \%$.
  • Figure 4: Confusion matrices under the training ratio of$10 \%$ by using the following methods: (a) Color histograms, (b) BoVW, (c) VGGNet-16, and (d) Fine-tuned VGGNet-16.
  • Figure 5: Confusion matrices under the training ratio of$20 \%$ by using the following methods: (a) Color histograms, (b) BoVW, (c) VGGNet-16, and (d) Fine-tuned VGGNet-16.