Vision Foundation Models in Remote Sensing: A Survey

Siqi Lu; Junlin Guo; James R Zimmer-Dauphinee; Jordan M Nieusma; Xiao Wang; Parker VanValkenburgh; Steven A Wernke; Yuankai Huo

Vision Foundation Models in Remote Sensing: A Survey

Siqi Lu, Junlin Guo, James R Zimmer-Dauphinee, Jordan M Nieusma, Xiao Wang, Parker VanValkenburgh, Steven A Wernke, Yuankai Huo

TL;DR

The paper surveys vision foundation models for remote sensing released between June 2021 and June 2024, organizing them by architecture, pretraining datasets, and methodologies. It identifies self-supervised learning, especially contrastive learning and masked autoencoder approaches, and transformer-based backbones as key drivers of robust, transferable representations across image-, region-, and pixel-level tasks. The review highlights performance trends across environmental monitoring, agriculture, archaeology, urban planning, and disaster management, and discusses practical implications, data/computational challenges, and future directions. It advocates for efficient architectures, enhanced multi-modal data integration, and broader data diversity to enable real-world deployment and generalization across diverse geographies and sensor modalities.

Abstract

Artificial Intelligence (AI) technologies have profoundly transformed the field of remote sensing, revolutionizing data collection, processing, and analysis. Traditionally reliant on manual interpretation and task-specific models, remote sensing research has been significantly enhanced by the advent of foundation models-large-scale, pre-trained AI models capable of performing a wide array of tasks with unprecedented accuracy and efficiency. This paper provides a comprehensive survey of foundation models in the remote sensing domain. We categorize these models based on their architectures, pre-training datasets, and methodologies. Through detailed performance comparisons, we highlight emerging trends and the significant advancements achieved by those foundation models. Additionally, we discuss technical challenges, practical implications, and future research directions, addressing the need for high-quality data, computational resources, and improved model generalization. Our research also finds that pre-training methods, particularly self-supervised learning techniques like contrastive learning and masked autoencoders, remarkably enhance the performance and robustness of foundation models. This survey aims to serve as a resource for researchers and practitioners by providing a panorama of advances and promising pathways for continued development and application of foundation models in remote sensing.

Vision Foundation Models in Remote Sensing: A Survey

TL;DR

Abstract

Paper Structure (34 sections, 2 equations, 4 figures, 3 tables)

This paper contains 34 sections, 2 equations, 4 figures, 3 tables.

Introduction
Background
Remote Sensing
Foundation Models for Remote Sensing
Related Review Papers
Pretraining Methods
Self-Supervised Learning
Predictive Coding
Contrastive Learning
Supervised Pretraining
Image Analysis Methods
Image Perception at Different Levels
Image-Level
Region-Level
Pixel-Level
...and 19 more sections

Figures (4)

Figure 1: Overview of some well-known foundation models for remote sensing from 2021 June to 2024 June.
Figure 2: Examples of data types used in those foundation models and downstream tasks that can be done by foundation models. Data: (1) Panchromaticarbeck_english_2013, (2) True Color, (3) SARSAR, (4) Hyperspectralarbeck_english_2013, (5) Multispectralarbeck_english_2013. Downstream tasks: (1) Segmentation, (2) Object Detection, (3) Classificationfmow2018, (4) Change DetectionChangeDetection.
Figure 3: General pipeline of SSLjing2019selfsupervised. Diverse datasets images and pretext task images are acquired from ImageNetimageNet, BigEarthNetSumbul_2019_Bigearthnet, and MillionAIDLong2022ASP_millionaid. Finetune dataset includes images from DIORLi_2020_DIOR.
Figure 5: The Vision transformer architecture.

Vision Foundation Models in Remote Sensing: A Survey

TL;DR

Abstract

Vision Foundation Models in Remote Sensing: A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (4)