Table of Contents
Fetching ...

SatVision-TOA: A Geospatial Foundation Model for Coarse-Resolution All-Sky Remote Sensing Imagery

Caleb S. Spradlin, Jordan A. Caraballo-Vega, Jian Li, Mark L. Carroll, Jie Gong, Paul M. Montesano

TL;DR

SatVision-TOA is introduced, a novel foundation model pre-trained on 14-band MODIS L1B Top-Of-Atmosphere (TOA) radiance imagery, addressing the need for models pre-trained to handle moderate- and coarse-resolution all-sky remote sensing data.

Abstract

Foundation models have the potential to transform the landscape of remote sensing (RS) data analysis by enabling large computer vision models to be pre-trained on vast amounts of remote sensing data. These models can then be fine-tuned with small amounts of labeled training and applied to a variety of applications. Most existing foundation models are designed for high spatial resolution, cloud-free satellite imagery or photos, limiting their applicability in scenarios that require frequent temporal monitoring or broad spectral profiles. As a result, foundation models trained solely on cloud-free images have limited utility for applications that involve atmospheric variables or require atmospheric corrections. We introduce SatVision-TOA, a novel foundation model pre-trained on 14-band MODIS L1B Top-Of-Atmosphere (TOA) radiance imagery, addressing the need for models pre-trained to handle moderate- and coarse-resolution all-sky remote sensing data. The SatVision-TOA model is pre-trained using a Masked-Image-Modeling (MIM) framework and the SwinV2 architecture, and learns detailed contextual representations through self-supervised learning without the need for labels. It is a 3 billion parameter model that is trained on 100 million images. To our knowledge this is the largest foundation model trained solely on satellite RS imagery. Results show that SatVision-TOA achieves superior performance over baseline methods on downstream tasks such as 3D cloud retrieval. Notably, the model achieves a mean intersection over union (mIOU) of 0.46, a substantial improvement over the baseline mIOU of 0.22. Additionally, the rate of false negative results in the fine-tuning task were reduced by over 50% compared to the baseline. Our work advances pre-trained vision modeling for multispectral RS by learning from a variety of atmospheric and aerosol conditions to improve cloud and land surface monitoring.

SatVision-TOA: A Geospatial Foundation Model for Coarse-Resolution All-Sky Remote Sensing Imagery

TL;DR

SatVision-TOA is introduced, a novel foundation model pre-trained on 14-band MODIS L1B Top-Of-Atmosphere (TOA) radiance imagery, addressing the need for models pre-trained to handle moderate- and coarse-resolution all-sky remote sensing data.

Abstract

Foundation models have the potential to transform the landscape of remote sensing (RS) data analysis by enabling large computer vision models to be pre-trained on vast amounts of remote sensing data. These models can then be fine-tuned with small amounts of labeled training and applied to a variety of applications. Most existing foundation models are designed for high spatial resolution, cloud-free satellite imagery or photos, limiting their applicability in scenarios that require frequent temporal monitoring or broad spectral profiles. As a result, foundation models trained solely on cloud-free images have limited utility for applications that involve atmospheric variables or require atmospheric corrections. We introduce SatVision-TOA, a novel foundation model pre-trained on 14-band MODIS L1B Top-Of-Atmosphere (TOA) radiance imagery, addressing the need for models pre-trained to handle moderate- and coarse-resolution all-sky remote sensing data. The SatVision-TOA model is pre-trained using a Masked-Image-Modeling (MIM) framework and the SwinV2 architecture, and learns detailed contextual representations through self-supervised learning without the need for labels. It is a 3 billion parameter model that is trained on 100 million images. To our knowledge this is the largest foundation model trained solely on satellite RS imagery. Results show that SatVision-TOA achieves superior performance over baseline methods on downstream tasks such as 3D cloud retrieval. Notably, the model achieves a mean intersection over union (mIOU) of 0.46, a substantial improvement over the baseline mIOU of 0.22. Additionally, the rate of false negative results in the fine-tuning task were reduced by over 50% compared to the baseline. Our work advances pre-trained vision modeling for multispectral RS by learning from a variety of atmospheric and aerosol conditions to improve cloud and land surface monitoring.

Paper Structure

This paper contains 26 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Examples of image reconstruction by SatVision-TOA. Left: MOD021KM v6.1 cropped image chip using MODIS bands [1, 3, 2]. Middle: The same images with randomly applied 8$\times$8 mask patches, masking 60% of the original image. Right: The reconstructed images produced by the model, along with their respective Structural Similarity Index Measure (SSIM) scores. These examples illustrate the model's ability to preserve structural detail and reconstruct heterogeneous features, such as cloud textures and land-cover transitions, with high fidelity.
  • Figure 2: Examining model performance on 3D cloud retrieval downstream task. Left: ABI image chips showing the CloudSat/CALIPSO curtain transect. Middle-left: Cloud predictions from the SatVision-TOA-Giant and baseline FCN models. The bottom color bar indicates the location along the transect where clouds are predicted by the models. Middle-right: Ground truth vertical cloud retrieval mask from CloudSat/CALIPSO. Right: Difference between the predicted vertical cloud mask and the ground truth, where red indicates false positives and blue indicates false negatives.
  • Figure 3: 3D Cloud Retrieval Results: Receiver Operating Characteristic (ROC) Curve Comparison, SatVision-TOA (SVTOA-FCN) vs Baseline (FCN-Baseline) models- Highlighting performance differences in classification accuracy with Area Under the Curve (AUC) values for each model.
  • Figure 4: 3D Cloud Retrieval Predictions
  • Figure 5: 3D Cloud Retrieval Predictions