Table of Contents
Fetching ...

Multi-Label Scene Classification in Remote Sensing Benefits from Image Super-Resolution

Ashitha Mudraje, Brian B. Moser, Stanislav Frolov, Andreas Dengel

TL;DR

This work tackles the degradation of multi-label scene classification accuracy caused by limited spatial resolution in remote sensing imagery. It evaluates four pre-trained image super-resolution models—SRResNet, HAT, SeeSR, and RealESRGAN—as a pre-processing step before training four CNN classifiers (ResNet-50/101/152, Inception-v4). Across BigEarthNet patches, SR pre-processing improves multiple metrics, with SRResNet enhancing shallow networks and attention-based HAT yielding the strongest gains for deeper architectures; generative SR methods show potential artifacts. The study provides a practical framework to integrate SR into RS systems and offers guidance on selecting SR techniques to boost downstream multi-label predictions.

Abstract

Satellite imagery is a cornerstone for numerous Remote Sensing (RS) applications; however, limited spatial resolution frequently hinders the precision of such systems, especially in multi-label scene classification tasks as it requires a higher level of detail and feature differentiation. In this study, we explore the efficacy of image Super-Resolution (SR) as a pre-processing step to enhance the quality of satellite images and thus improve downstream classification performance. We investigate four SR models - SRResNet, HAT, SeeSR, and RealESRGAN - and evaluate their impact on multi-label scene classification across various CNN architectures, including ResNet-50, ResNet-101, ResNet-152, and Inception-v4. Our results show that applying SR significantly improves downstream classification performance across various metrics, demonstrating its ability to preserve spatial details critical for multi-label tasks. Overall, this work offers valuable insights into the selection of SR techniques for multi-label prediction in remote sensing and presents an easy-to-integrate framework to improve existing RS systems.

Multi-Label Scene Classification in Remote Sensing Benefits from Image Super-Resolution

TL;DR

This work tackles the degradation of multi-label scene classification accuracy caused by limited spatial resolution in remote sensing imagery. It evaluates four pre-trained image super-resolution models—SRResNet, HAT, SeeSR, and RealESRGAN—as a pre-processing step before training four CNN classifiers (ResNet-50/101/152, Inception-v4). Across BigEarthNet patches, SR pre-processing improves multiple metrics, with SRResNet enhancing shallow networks and attention-based HAT yielding the strongest gains for deeper architectures; generative SR methods show potential artifacts. The study provides a practical framework to integrate SR into RS systems and offers guidance on selecting SR techniques to boost downstream multi-label predictions.

Abstract

Satellite imagery is a cornerstone for numerous Remote Sensing (RS) applications; however, limited spatial resolution frequently hinders the precision of such systems, especially in multi-label scene classification tasks as it requires a higher level of detail and feature differentiation. In this study, we explore the efficacy of image Super-Resolution (SR) as a pre-processing step to enhance the quality of satellite images and thus improve downstream classification performance. We investigate four SR models - SRResNet, HAT, SeeSR, and RealESRGAN - and evaluate their impact on multi-label scene classification across various CNN architectures, including ResNet-50, ResNet-101, ResNet-152, and Inception-v4. Our results show that applying SR significantly improves downstream classification performance across various metrics, demonstrating its ability to preserve spatial details critical for multi-label tasks. Overall, this work offers valuable insights into the selection of SR techniques for multi-label prediction in remote sensing and presents an easy-to-integrate framework to improve existing RS systems.
Paper Structure (12 sections, 1 equation, 3 figures, 2 tables)

This paper contains 12 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Illustration of our proposed pipeline that uses image super-resolution as a pre-processing step for multi-label scene classification for improved label prediction.
  • Figure 2: SR Comparison of HAT and RealESRGAN. While HAT provides relatively balanced enhancements, RealESRGAN tends to hallucinate details (e.g., overemphasizing streets and introducing artificial patterns in forest regions), illustrating the pitfalls of generative SR methods in certain remote sensing scenes.
  • Figure 3: Comparison of Grad-CAM visualizations in ResNet models between the baseline (No SR) and various SR methods. SRResNet (for shallow classifier) and HAT (for deeper classifier) lead to more activation coverage across the whole satellite image.