Table of Contents
Fetching ...

REMSA: An LLM Agent for Foundation Model Selection in Remote Sensing

Binger Chen, Tacettin Emre Bök, Behnood Rasti, Volker Markl, Begüm Demir

TL;DR

This work introduces RS-FMD, the first structured, schema-guided database of over 150 remote sensing foundation models, and REMSA, a modular LLM-based agent that automates FM selection from natural-language queries. By grounding reasoning in RS-FMD metadata, using retrieval, in-context ranking, clarification, and memory, REMSA achieves constraint-aware model matching with transparent explanations. The authors also establish the first expert-driven benchmark for RSFM selection (75 queries and 900 ratings) and demonstrate that REMSA outperforms retrieval-only and unstructured baselines while maintaining methodological transparency and reproducibility. Together, RS-FMD and REMSA offer a principled, scalable pathway for automated, explainable FM selection under real-world deployment constraints in remote sensing.

Abstract

Foundation Models (FMs) are increasingly used in remote sensing (RS) for tasks such as environmental monitoring, disaster assessment, and land-use mapping. These models include unimodal vision encoders trained on a single data modality and multimodal architectures trained on combinations of SAR, multispectral, hyperspectral, and image-text data. They support diverse RS tasks including semantic segmentation, image classification, change detection, and visual question answering. However, selecting an appropriate remote sensing foundation model (RSFM) remains difficult due to scattered documentation, heterogeneous formats, and varied deployment constraints. We introduce the RSFM Database (RS-FMD), a structured resource covering over 150 RSFMs spanning multiple data modalities, resolutions, and learning paradigms. Built on RS-FMD, we present REMSA, the first LLM-based agent for automated RSFM selection from natural language queries. REMSA interprets user requirements, resolves missing constraints, ranks candidate models using in-context learning, and provides transparent justifications. We also propose a benchmark of 75 expert-verified RS query scenarios, producing 900 configurations under an expert-centered evaluation protocol. REMSA outperforms several baselines, including naive agents, dense retrieval, and unstructured RAG-based LLMs. It operates entirely on publicly available metadata and does not access private or sensitive data.

REMSA: An LLM Agent for Foundation Model Selection in Remote Sensing

TL;DR

This work introduces RS-FMD, the first structured, schema-guided database of over 150 remote sensing foundation models, and REMSA, a modular LLM-based agent that automates FM selection from natural-language queries. By grounding reasoning in RS-FMD metadata, using retrieval, in-context ranking, clarification, and memory, REMSA achieves constraint-aware model matching with transparent explanations. The authors also establish the first expert-driven benchmark for RSFM selection (75 queries and 900 ratings) and demonstrate that REMSA outperforms retrieval-only and unstructured baselines while maintaining methodological transparency and reproducibility. Together, RS-FMD and REMSA offer a principled, scalable pathway for automated, explainable FM selection under real-world deployment constraints in remote sensing.

Abstract

Foundation Models (FMs) are increasingly used in remote sensing (RS) for tasks such as environmental monitoring, disaster assessment, and land-use mapping. These models include unimodal vision encoders trained on a single data modality and multimodal architectures trained on combinations of SAR, multispectral, hyperspectral, and image-text data. They support diverse RS tasks including semantic segmentation, image classification, change detection, and visual question answering. However, selecting an appropriate remote sensing foundation model (RSFM) remains difficult due to scattered documentation, heterogeneous formats, and varied deployment constraints. We introduce the RSFM Database (RS-FMD), a structured resource covering over 150 RSFMs spanning multiple data modalities, resolutions, and learning paradigms. Built on RS-FMD, we present REMSA, the first LLM-based agent for automated RSFM selection from natural language queries. REMSA interprets user requirements, resolves missing constraints, ranks candidate models using in-context learning, and provides transparent justifications. We also propose a benchmark of 75 expert-verified RS query scenarios, producing 900 configurations under an expert-centered evaluation protocol. REMSA outperforms several baselines, including naive agents, dense retrieval, and unstructured RAG-based LLMs. It operates entirely on publicly available metadata and does not access private or sensitive data.

Paper Structure

This paper contains 21 sections, 1 equation, 1 figure, 5 tables, 1 algorithm.

Figures (1)

  • Figure 1: Architecture of Remsa