Table of Contents
Fetching ...

Generic Knowledge Boosted Pre-training For Remote Sensing Images

Ziyue Huang, Mingming Zhang, Yuan Gong, Qingjie Liu, Yunhong Wang

TL;DR

This work tackles the domain gap between remote sensing and natural images by proposing GeRSP, a teacher-student pre-training framework that fuses natural-image supervision with RS self-supervision. It jointly optimizes a natural-image auxiliary learning branch and a remote-sensing contrastive learning branch, using $L_{total}=L_{ct}+\\alpha L_{ce}$ and an EMA teacher with momentum $m=0.996$ plus a dynamic queue of $65{,}536$ negatives. The framework is evaluated on three RS downstream tasks (scene classification, object detection, segmentation) and shows consistent improvements over ImageNet pre-training and RS-only methods, validated on datasets such as EuroSAT, NWPU-RESISC45, DIOR, DOTA, and LoveDA. The results indicate that incorporating general image knowledge with RS-specific learning yields more robust, transferable representations for RS understanding tasks, with qualitative CAM analyses confirming improved focus on RS semantic regions.

Abstract

Deep learning models are essential for scene classification, change detection, land cover segmentation, and other remote sensing image understanding tasks. Most backbones of existing remote sensing deep learning models are typically initialized by pre-trained weights obtained from ImageNet pre-training (IMP). However, domain gaps exist between remote sensing images and natural images (e.g., ImageNet), making deep learning models initialized by pre-trained weights of IMP perform poorly for remote sensing image understanding. Although some pre-training methods are studied in the remote sensing community, current remote sensing pre-training methods face the problem of vague generalization by only using remote sensing images. In this paper, we propose a novel remote sensing pre-training framework, Generic Knowledge Boosted Remote Sensing Pre-training (GeRSP), to learn robust representations from remote sensing and natural images for remote sensing understanding tasks. GeRSP contains two pre-training branches: (1) A self-supervised pre-training branch is adopted to learn domain-related representations from unlabeled remote sensing images. (2) A supervised pre-training branch is integrated into GeRSP for general knowledge learning from labeled natural images. Moreover, GeRSP combines two pre-training branches using a teacher-student architecture to simultaneously learn representations with general and special knowledge, which generates a powerful pre-trained model for deep learning model initialization. Finally, we evaluate GeRSP and other remote sensing pre-training methods on three downstream tasks, i.e., object detection, semantic segmentation, and scene classification. The extensive experimental results consistently demonstrate that GeRSP can effectively learn robust representations in a unified manner, improving the performance of remote sensing downstream tasks.

Generic Knowledge Boosted Pre-training For Remote Sensing Images

TL;DR

This work tackles the domain gap between remote sensing and natural images by proposing GeRSP, a teacher-student pre-training framework that fuses natural-image supervision with RS self-supervision. It jointly optimizes a natural-image auxiliary learning branch and a remote-sensing contrastive learning branch, using and an EMA teacher with momentum plus a dynamic queue of negatives. The framework is evaluated on three RS downstream tasks (scene classification, object detection, segmentation) and shows consistent improvements over ImageNet pre-training and RS-only methods, validated on datasets such as EuroSAT, NWPU-RESISC45, DIOR, DOTA, and LoveDA. The results indicate that incorporating general image knowledge with RS-specific learning yields more robust, transferable representations for RS understanding tasks, with qualitative CAM analyses confirming improved focus on RS semantic regions.

Abstract

Deep learning models are essential for scene classification, change detection, land cover segmentation, and other remote sensing image understanding tasks. Most backbones of existing remote sensing deep learning models are typically initialized by pre-trained weights obtained from ImageNet pre-training (IMP). However, domain gaps exist between remote sensing images and natural images (e.g., ImageNet), making deep learning models initialized by pre-trained weights of IMP perform poorly for remote sensing image understanding. Although some pre-training methods are studied in the remote sensing community, current remote sensing pre-training methods face the problem of vague generalization by only using remote sensing images. In this paper, we propose a novel remote sensing pre-training framework, Generic Knowledge Boosted Remote Sensing Pre-training (GeRSP), to learn robust representations from remote sensing and natural images for remote sensing understanding tasks. GeRSP contains two pre-training branches: (1) A self-supervised pre-training branch is adopted to learn domain-related representations from unlabeled remote sensing images. (2) A supervised pre-training branch is integrated into GeRSP for general knowledge learning from labeled natural images. Moreover, GeRSP combines two pre-training branches using a teacher-student architecture to simultaneously learn representations with general and special knowledge, which generates a powerful pre-trained model for deep learning model initialization. Finally, we evaluate GeRSP and other remote sensing pre-training methods on three downstream tasks, i.e., object detection, semantic segmentation, and scene classification. The extensive experimental results consistently demonstrate that GeRSP can effectively learn robust representations in a unified manner, improving the performance of remote sensing downstream tasks.
Paper Structure (20 sections, 5 equations, 6 figures, 5 tables)

This paper contains 20 sections, 5 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: RS images encompass a wealth of domain-specific knowledge, whereas natural images offer a broader range of diverse generic image knowledge. The motivation of the GeRSP is to enhance the generalization performance of RSP by leveraging the diversity present in natural images.
  • Figure 2: The overall framework of our proposed Generic Knowledge Boosted Remote Sensing Pre-training (GeRSP). GeRSP integrates two learning processes: natural image auxiliary learning (NIAL) on labeled natural images and remote sensing contrastive learning (RSCL) on unlabeled RS images. NIAL utilizes labeled natural images for training. NIAL involves training the model using labeled natural images, while RSCL adopts a contrastive learning approach. The trained model is subsequently fine-tuned on various downstream tasks using task-specific data.
  • Figure 3: Data Augmentation Pipeline for pre-training.
  • Figure 4: Class activation maps (CAMs) visualization of GeRSP model and IMP model on six categories.
  • Figure 5: Class activation maps (CAMs) visualization of GeRSP model and IMP model on beach, cloud, and terrace.
  • ...and 1 more figures