Generic Knowledge Boosted Pre-training For Remote Sensing Images
Ziyue Huang, Mingming Zhang, Yuan Gong, Qingjie Liu, Yunhong Wang
TL;DR
This work tackles the domain gap between remote sensing and natural images by proposing GeRSP, a teacher-student pre-training framework that fuses natural-image supervision with RS self-supervision. It jointly optimizes a natural-image auxiliary learning branch and a remote-sensing contrastive learning branch, using $L_{total}=L_{ct}+\\alpha L_{ce}$ and an EMA teacher with momentum $m=0.996$ plus a dynamic queue of $65{,}536$ negatives. The framework is evaluated on three RS downstream tasks (scene classification, object detection, segmentation) and shows consistent improvements over ImageNet pre-training and RS-only methods, validated on datasets such as EuroSAT, NWPU-RESISC45, DIOR, DOTA, and LoveDA. The results indicate that incorporating general image knowledge with RS-specific learning yields more robust, transferable representations for RS understanding tasks, with qualitative CAM analyses confirming improved focus on RS semantic regions.
Abstract
Deep learning models are essential for scene classification, change detection, land cover segmentation, and other remote sensing image understanding tasks. Most backbones of existing remote sensing deep learning models are typically initialized by pre-trained weights obtained from ImageNet pre-training (IMP). However, domain gaps exist between remote sensing images and natural images (e.g., ImageNet), making deep learning models initialized by pre-trained weights of IMP perform poorly for remote sensing image understanding. Although some pre-training methods are studied in the remote sensing community, current remote sensing pre-training methods face the problem of vague generalization by only using remote sensing images. In this paper, we propose a novel remote sensing pre-training framework, Generic Knowledge Boosted Remote Sensing Pre-training (GeRSP), to learn robust representations from remote sensing and natural images for remote sensing understanding tasks. GeRSP contains two pre-training branches: (1) A self-supervised pre-training branch is adopted to learn domain-related representations from unlabeled remote sensing images. (2) A supervised pre-training branch is integrated into GeRSP for general knowledge learning from labeled natural images. Moreover, GeRSP combines two pre-training branches using a teacher-student architecture to simultaneously learn representations with general and special knowledge, which generates a powerful pre-trained model for deep learning model initialization. Finally, we evaluate GeRSP and other remote sensing pre-training methods on three downstream tasks, i.e., object detection, semantic segmentation, and scene classification. The extensive experimental results consistently demonstrate that GeRSP can effectively learn robust representations in a unified manner, improving the performance of remote sensing downstream tasks.
