Table of Contents
Fetching ...

Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents

Peilin Feng, Zhutao Lv, Junyan Ye, Xiaolei Wang, Xinjie Huo, Jinhua Yu, Wanghan Xu, Wenlong Zhang, Lei Bai, Conghui He, Weijia Li

TL;DR

Earth-Agent tackles the limitations of RGB-centric EO models by introducing a tool-augmented agent framework built on MCP and ReAct reasoning to unify RGB and spectral data for multi-step EO analysis. It enables dynamic tool use across 104 domain-specific EO tools organized into five kits (Index, Inversion, Perception, Analysis, Statistics) and is evaluated on Earth-Bench, a dual-regime benchmark with 248 expert-curated tasks spanning 13,729 images and requiring both reasoning trajectories and final outcomes. The paper provides a comprehensive experimental comparison across LLM backbones, general agents, and MLLMs, demonstrating improved cross-modal reasoning and quantitative analysis in EO tasks while highlighting current bottlenecks in spectrum data and the need for more robust perception models. Overall, Earth-Agent defines a scalable, scientifically grounded paradigm for next-generation EO analysis using tool-enabled LLMs and establishes Earth-Bench as a rigorous benchmark for evaluating reasoning processes and results.

Abstract

Earth observation (EO) is essential for understanding the evolving states of the Earth system. Although recent MLLMs have advanced EO research, they still lack the capability to tackle complex tasks that require multi-step reasoning and the use of domain-specific tools. Agent-based methods offer a promising direction, but current attempts remain in their infancy, confined to RGB perception, shallow reasoning, and lacking systematic evaluation protocols. To overcome these limitations, we introduce Earth-Agent, the first agentic framework that unifies RGB and spectral EO data within an MCP-based tool ecosystem, enabling cross-modal, multi-step, and quantitative spatiotemporal reasoning beyond pretrained MLLMs. Earth-Agent supports complex scientific tasks such as geophysical parameter retrieval and quantitative spatiotemporal analysis by dynamically invoking expert tools and models across modalities. To support comprehensive evaluation, we further propose Earth-Bench, a benchmark of 248 expert-curated tasks with 13,729 images, spanning spectrum, products and RGB modalities, and equipped with a dual-level evaluation protocol that assesses both reasoning trajectories and final outcomes. We conduct comprehensive experiments varying different LLM backbones, comparisons with general agent frameworks, and comparisons with MLLMs on remote sensing benchmarks, demonstrating both the effectiveness and potential of Earth-Agent. Earth-Agent establishes a new paradigm for EO analysis, moving the field toward scientifically grounded, next-generation applications of LLMs in Earth observation.

Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents

TL;DR

Earth-Agent tackles the limitations of RGB-centric EO models by introducing a tool-augmented agent framework built on MCP and ReAct reasoning to unify RGB and spectral data for multi-step EO analysis. It enables dynamic tool use across 104 domain-specific EO tools organized into five kits (Index, Inversion, Perception, Analysis, Statistics) and is evaluated on Earth-Bench, a dual-regime benchmark with 248 expert-curated tasks spanning 13,729 images and requiring both reasoning trajectories and final outcomes. The paper provides a comprehensive experimental comparison across LLM backbones, general agents, and MLLMs, demonstrating improved cross-modal reasoning and quantitative analysis in EO tasks while highlighting current bottlenecks in spectrum data and the need for more robust perception models. Overall, Earth-Agent defines a scalable, scientifically grounded paradigm for next-generation EO analysis using tool-enabled LLMs and establishes Earth-Bench as a rigorous benchmark for evaluating reasoning processes and results.

Abstract

Earth observation (EO) is essential for understanding the evolving states of the Earth system. Although recent MLLMs have advanced EO research, they still lack the capability to tackle complex tasks that require multi-step reasoning and the use of domain-specific tools. Agent-based methods offer a promising direction, but current attempts remain in their infancy, confined to RGB perception, shallow reasoning, and lacking systematic evaluation protocols. To overcome these limitations, we introduce Earth-Agent, the first agentic framework that unifies RGB and spectral EO data within an MCP-based tool ecosystem, enabling cross-modal, multi-step, and quantitative spatiotemporal reasoning beyond pretrained MLLMs. Earth-Agent supports complex scientific tasks such as geophysical parameter retrieval and quantitative spatiotemporal analysis by dynamically invoking expert tools and models across modalities. To support comprehensive evaluation, we further propose Earth-Bench, a benchmark of 248 expert-curated tasks with 13,729 images, spanning spectrum, products and RGB modalities, and equipped with a dual-level evaluation protocol that assesses both reasoning trajectories and final outcomes. We conduct comprehensive experiments varying different LLM backbones, comparisons with general agent frameworks, and comparisons with MLLMs on remote sensing benchmarks, demonstrating both the effectiveness and potential of Earth-Agent. Earth-Agent establishes a new paradigm for EO analysis, moving the field toward scientifically grounded, next-generation applications of LLMs in Earth observation.

Paper Structure

This paper contains 30 sections, 12 equations, 39 figures, 8 tables.

Figures (39)

  • Figure 1: Overview of our work: The top panel contrasts prior paradigms: MLLM-based EO research (left), Existing agent-based EO research (middle), and our Earth-Agent (right). The bottom panel illustrates our contributions, including Earth-Bench construction, Earth-Agent ReAct with the predefined toolkit, and dual-level evaluation of both reasoning trajectories and final results.
  • Figure 2: Earth-Agent solving tasks across Spectrum, Products, and RGB data through multi-step reasoning with expert tool calls.
  • Figure 3: Earth-Agent Framework: The left part illustrates the ReAct-style workflow, where Earth-Agent iteratively performs tool calling, memory update, thinking, and action using domain-specific toolkits. The right panel presents the dual-level evaluation protocol, assessing both step-by-step reasoning trajectories and end-to-end outcomes.
  • Figure 4: Dataset Comparison and Overview: The left panel compares Earth-Bench with prior MLLM and agentbased EO benchmarks. The right panel presents the statistics of Earth-Bench and its evaluation with SOTA LLMs using Earth-Agent, highlighting the difficulty of Earth-Bench.
  • Figure 5: Construction and Annotation of Earth-Bench. The left shows question generation from EO data, the right illustrates the data annotation pipeline that simulates ReAct-style trajectories, and the bottom provides an example explaining the multi-step annotation process.
  • ...and 34 more figures