Table of Contents
Fetching ...

Geo-OLM: Enabling Sustainable Earth Observation Studies with Cost-Efficient Open Language Models & State-Driven Workflows

Dimitrios Stamoulis, Diana Marculescu

TL;DR

Geo-OLM tackles the high cost of CEO-level geospatial Copilots by introducing a state-driven prompting paradigm that decouples reasoning from tool execution, enabling small open language models (≤7B parameters) to perform complex Earth observation tasks with competitive accuracy. The framework models EO workflows as finite-state machines, guiding per-state tool use, error handling, and completion checks, which dramatically improves reliability for low-resource models. Extensive evaluation across proprietary and open LLM families on GeoLLM-Engine and GeoLLM-Squad benchmarks, plus a Turkey earthquake case study, shows near-GPT-4o performance for large models while achieving 10–100× cost reductions with open models, and maintaining 10–20% gaps even at smaller sizes. These findings demonstrate the practicality and impact of structured, state-driven geospatial agents for sustainable, scalable EO research and policymaking.

Abstract

Geospatial Copilots hold immense potential for automating Earth observation (EO) and climate monitoring workflows, yet their reliance on large-scale models such as GPT-4o introduces a paradox: tools intended for sustainability studies often incur unsustainable costs. Using agentic AI frameworks in geospatial applications can amass thousands of dollars in API charges or requires expensive, power-intensive GPUs for deployment, creating barriers for researchers, policymakers, and NGOs. Unfortunately, when geospatial Copilots are deployed with open language models (OLMs), performance often degrades due to their dependence on GPT-optimized logic. In this paper, we present Geo-OLM, a tool-augmented geospatial agent that leverages the novel paradigm of state-driven LLM reasoning to decouple task progression from tool calling. By alleviating the workflow reasoning burden, our approach enables low-resource OLMs to complete geospatial tasks more effectively. When downsizing to small models below 7B parameters, Geo-OLM outperforms the strongest prior geospatial baselines by 32.8% in successful query completion rates. Our method performs comparably to proprietary models achieving results within 10% of GPT-4o, while reducing inference costs by two orders of magnitude from \$500-\$1000 to under \$10. We present an in-depth analysis with geospatial downstream benchmarks, providing key insights to help practitioners effectively deploy OLMs for EO applications.

Geo-OLM: Enabling Sustainable Earth Observation Studies with Cost-Efficient Open Language Models & State-Driven Workflows

TL;DR

Geo-OLM tackles the high cost of CEO-level geospatial Copilots by introducing a state-driven prompting paradigm that decouples reasoning from tool execution, enabling small open language models (≤7B parameters) to perform complex Earth observation tasks with competitive accuracy. The framework models EO workflows as finite-state machines, guiding per-state tool use, error handling, and completion checks, which dramatically improves reliability for low-resource models. Extensive evaluation across proprietary and open LLM families on GeoLLM-Engine and GeoLLM-Squad benchmarks, plus a Turkey earthquake case study, shows near-GPT-4o performance for large models while achieving 10–100× cost reductions with open models, and maintaining 10–20% gaps even at smaller sizes. These findings demonstrate the practicality and impact of structured, state-driven geospatial agents for sustainable, scalable EO research and policymaking.

Abstract

Geospatial Copilots hold immense potential for automating Earth observation (EO) and climate monitoring workflows, yet their reliance on large-scale models such as GPT-4o introduces a paradox: tools intended for sustainability studies often incur unsustainable costs. Using agentic AI frameworks in geospatial applications can amass thousands of dollars in API charges or requires expensive, power-intensive GPUs for deployment, creating barriers for researchers, policymakers, and NGOs. Unfortunately, when geospatial Copilots are deployed with open language models (OLMs), performance often degrades due to their dependence on GPT-optimized logic. In this paper, we present Geo-OLM, a tool-augmented geospatial agent that leverages the novel paradigm of state-driven LLM reasoning to decouple task progression from tool calling. By alleviating the workflow reasoning burden, our approach enables low-resource OLMs to complete geospatial tasks more effectively. When downsizing to small models below 7B parameters, Geo-OLM outperforms the strongest prior geospatial baselines by 32.8% in successful query completion rates. Our method performs comparably to proprietary models achieving results within 10% of GPT-4o, while reducing inference costs by two orders of magnitude from \1000 to under \$10. We present an in-depth analysis with geospatial downstream benchmarks, providing key insights to help practitioners effectively deploy OLMs for EO applications.

Paper Structure

This paper contains 12 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: State-driven LLM reasoning with Geo-OLM. Earth observation (EO) workflows exhibit inherent progression logic, which we encapsulate using the StateFlow paradigm wu2024stateflow. We illustrate this first in a single-agent scenario for satellite detection in remote sensing (RS) tasks (left) and extend it flexibly to a multi-agent setup incorporating diverse EO applications (right). By structuring reasoning into explicit states, Geo-OLM reduces the cognitive burden on low-resource OLMs, allowing them to better reflect on task progression, correctly handle errors, and prevent premature function termination.
  • Figure 2: Agentic success and correctness rates vs. benchmarking cost (for a typical benchmark size of 2K user queries) across various models (distinguished by markers) and prompting techniques (distinguished by colors). Overall, Geo-OLM achieves better performance-cost trade-offs across model scales from 72B down to 3B parameters, sustaining agentic performance at reduced computational expenses (best viewed in color).