Geo-OLM: Enabling Sustainable Earth Observation Studies with Cost-Efficient Open Language Models & State-Driven Workflows
Dimitrios Stamoulis, Diana Marculescu
TL;DR
Geo-OLM tackles the high cost of CEO-level geospatial Copilots by introducing a state-driven prompting paradigm that decouples reasoning from tool execution, enabling small open language models (≤7B parameters) to perform complex Earth observation tasks with competitive accuracy. The framework models EO workflows as finite-state machines, guiding per-state tool use, error handling, and completion checks, which dramatically improves reliability for low-resource models. Extensive evaluation across proprietary and open LLM families on GeoLLM-Engine and GeoLLM-Squad benchmarks, plus a Turkey earthquake case study, shows near-GPT-4o performance for large models while achieving 10–100× cost reductions with open models, and maintaining 10–20% gaps even at smaller sizes. These findings demonstrate the practicality and impact of structured, state-driven geospatial agents for sustainable, scalable EO research and policymaking.
Abstract
Geospatial Copilots hold immense potential for automating Earth observation (EO) and climate monitoring workflows, yet their reliance on large-scale models such as GPT-4o introduces a paradox: tools intended for sustainability studies often incur unsustainable costs. Using agentic AI frameworks in geospatial applications can amass thousands of dollars in API charges or requires expensive, power-intensive GPUs for deployment, creating barriers for researchers, policymakers, and NGOs. Unfortunately, when geospatial Copilots are deployed with open language models (OLMs), performance often degrades due to their dependence on GPT-optimized logic. In this paper, we present Geo-OLM, a tool-augmented geospatial agent that leverages the novel paradigm of state-driven LLM reasoning to decouple task progression from tool calling. By alleviating the workflow reasoning burden, our approach enables low-resource OLMs to complete geospatial tasks more effectively. When downsizing to small models below 7B parameters, Geo-OLM outperforms the strongest prior geospatial baselines by 32.8% in successful query completion rates. Our method performs comparably to proprietary models achieving results within 10% of GPT-4o, while reducing inference costs by two orders of magnitude from \$500-\$1000 to under \$10. We present an in-depth analysis with geospatial downstream benchmarks, providing key insights to help practitioners effectively deploy OLMs for EO applications.
