Table of Contents
Fetching ...

Multi-Spectral Remote Sensing Image Retrieval Using Geospatial Foundation Models

Benedikt Blumenstiel, Viktoria Moor, Romeo Kienzler, Thomas Brunschwiler

TL;DR

This work addresses scalable retrieval of multispectral remote-sensing images by leveraging Geospatial Foundation Models (GeoFMs) to produce rich, multi-band embeddings without fine-tuning. It demonstrates that Prithvi-100M, a six-band GeoFM, delivers state-of-the-art retrieval on two multispectral benchmarks (BigEarthNet-43 and ForestNet-12) compared to RGB-based models. The authors also compare vector-based embeddings with binary hash compressions, showing that 64-bit trivial hashes offer substantially faster retrieval with minimal accuracy loss, while 32-bit hashes degrade more noticeably. The work provides practical baselines, a public implementation, and suggests that GeoFMs enable robust, scalable retrieval in earth observation with minimal domain adaptation.

Abstract

Image retrieval enables an efficient search through vast amounts of satellite imagery and returns similar images to a query. Deep learning models can identify images across various semantic concepts without the need for annotations. This work proposes to use Geospatial Foundation Models, like Prithvi, for remote sensing image retrieval with multiple benefits: i) the models encode multi-spectral satellite data and ii) generalize without further fine-tuning. We introduce two datasets to the retrieval task and observe a strong performance: Prithvi processes six bands and achieves a mean Average Precision of 97.62% on BigEarthNet-43 and 44.51% on ForestNet-12, outperforming other RGB-based models. Further, we evaluate three compression methods with binarized embeddings balancing retrieval speed and accuracy. They match the retrieval speed of much shorter hash codes while maintaining the same accuracy as floating-point embeddings but with a 32-fold compression. The code is available at https://github.com/IBM/remote-sensing-image-retrieval.

Multi-Spectral Remote Sensing Image Retrieval Using Geospatial Foundation Models

TL;DR

This work addresses scalable retrieval of multispectral remote-sensing images by leveraging Geospatial Foundation Models (GeoFMs) to produce rich, multi-band embeddings without fine-tuning. It demonstrates that Prithvi-100M, a six-band GeoFM, delivers state-of-the-art retrieval on two multispectral benchmarks (BigEarthNet-43 and ForestNet-12) compared to RGB-based models. The authors also compare vector-based embeddings with binary hash compressions, showing that 64-bit trivial hashes offer substantially faster retrieval with minimal accuracy loss, while 32-bit hashes degrade more noticeably. The work provides practical baselines, a public implementation, and suggests that GeoFMs enable robust, scalable retrieval in earth observation with minimal domain adaptation.

Abstract

Image retrieval enables an efficient search through vast amounts of satellite imagery and returns similar images to a query. Deep learning models can identify images across various semantic concepts without the need for annotations. This work proposes to use Geospatial Foundation Models, like Prithvi, for remote sensing image retrieval with multiple benefits: i) the models encode multi-spectral satellite data and ii) generalize without further fine-tuning. We introduce two datasets to the retrieval task and observe a strong performance: Prithvi processes six bands and achieves a mean Average Precision of 97.62% on BigEarthNet-43 and 44.51% on ForestNet-12, outperforming other RGB-based models. Further, we evaluate three compression methods with binarized embeddings balancing retrieval speed and accuracy. They match the retrieval speed of much shorter hash codes while maintaining the same accuracy as floating-point embeddings but with a 32-fold compression. The code is available at https://github.com/IBM/remote-sensing-image-retrieval.
Paper Structure (8 sections, 3 figures, 2 tables)

This paper contains 8 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: GeoFM embeddings enable simple but accurate CBIR. Optionally, the embeddings are compressed into smaller binary vectors. For each query image, similar images from the database are returned and sorted based on a distance function.
  • Figure 2: t-SNE plots of the ForestNet-4 test set with colored classes comparing Pritvi-100M embeddings.
  • Figure 3: Examples from two datasets with query images (left), their labels, and retrieved images (right) using Prithvi-100M and the 64-bit trivial hash. Images with green frames indicate positive matches, while those with red frames have different labels. Orange shows partial correct matches, where the number represents the number of label matches within the multi-labels.