Table of Contents
Fetching ...

GISclaw: An Open-Source LLM-Powered Agent System for Full-Stack Geospatial Analysis

Jinzhen Han, JinByeong Lee, Yuri Shim, Jisung Kim, Jae-Joon Lee

Abstract

The convergence of Large Language Models (LLMs) and Geographic Information Science has opened new avenues for automating complex geospatial analysis. However, existing LLM-powered GIS agents are constrained by limited data-type coverage (vector-only), reliance on proprietary GIS platforms, and single-model architectures that preclude systematic comparisons. We present GISclaw, an open-source agent system that integrates an LLM reasoning core with a persistent Python sandbox, a comprehensive suite of open-source GIS libraries (GeoPandas, rasterio, scipy, scikit-learn), and a web-based interactive interface for full-stack geospatial analysis spanning vector, raster, and tabular data. GISclaw implements two pluggable agent architectures -- a Single Agent ReAct loop and a Dual Agent Plan-Execute-Replan pipeline -- and supports six heterogeneous LLM backends ranging from cloud-hosted flagship models (GPT-5.4) to locally deployed 14B models on consumer GPUs. Through three key engineering innovations -- Schema Analysis bridging the task-data information gap, Domain Knowledge injection for domain-specific workflows, and an Error Memory mechanism for intelligent self-correction -- GISclaw achieves up to 96% task success on the 50-task GeoAnalystBench benchmark. Systematic evaluation across 600 model--architecture--task combinations reveals that the Dual Agent architecture consistently degrades strong models while providing marginal gains for weaker ones. We further propose a three-layer evaluation protocol incorporating code structure analysis, reasoning process assessment, and type-specific output verification for comprehensive GIS agent assessment. The system and all evaluation code are publicly available.

GISclaw: An Open-Source LLM-Powered Agent System for Full-Stack Geospatial Analysis

Abstract

The convergence of Large Language Models (LLMs) and Geographic Information Science has opened new avenues for automating complex geospatial analysis. However, existing LLM-powered GIS agents are constrained by limited data-type coverage (vector-only), reliance on proprietary GIS platforms, and single-model architectures that preclude systematic comparisons. We present GISclaw, an open-source agent system that integrates an LLM reasoning core with a persistent Python sandbox, a comprehensive suite of open-source GIS libraries (GeoPandas, rasterio, scipy, scikit-learn), and a web-based interactive interface for full-stack geospatial analysis spanning vector, raster, and tabular data. GISclaw implements two pluggable agent architectures -- a Single Agent ReAct loop and a Dual Agent Plan-Execute-Replan pipeline -- and supports six heterogeneous LLM backends ranging from cloud-hosted flagship models (GPT-5.4) to locally deployed 14B models on consumer GPUs. Through three key engineering innovations -- Schema Analysis bridging the task-data information gap, Domain Knowledge injection for domain-specific workflows, and an Error Memory mechanism for intelligent self-correction -- GISclaw achieves up to 96% task success on the 50-task GeoAnalystBench benchmark. Systematic evaluation across 600 model--architecture--task combinations reveals that the Dual Agent architecture consistently degrades strong models while providing marginal gains for weaker ones. We further propose a three-layer evaluation protocol incorporating code structure analysis, reasoning process assessment, and type-specific output verification for comprehensive GIS agent assessment. The system and all evaluation code are publicly available.

Paper Structure

This paper contains 25 sections, 5 equations, 12 figures, 9 tables.

Figures (12)

  • Figure 1: Overview of the GISclaw system architecture. The system accepts natural-language tasks with GIS data (vector, raster, tabular) as input, routes them through a pluggable LLM backend, and executes analysis in a persistent Python sandbox. Two agent architectures---Single Agent (ReAct) and Dual Agent (Plan-Execute-Replan)---are supported, with outputs evaluated via a three-layer protocol.
  • Figure 2: Comparison of the two agent architectures. (a) Single Agent follows a ReAct loop with Error Memory for self-correction. (b) Dual Agent decomposes tasks via a Planner, executes steps through a Worker, and adaptively replans upon failure.
  • Figure 3: Multi-layer evaluation pipeline. Three complementary layers assess code-level fidelity (L1), reasoning process quality (L2), and output correctness (L3), combined into a weighted composite score $S_{\text{comp}}$ (Eq. \ref{['eq:composite']}).
  • Figure 4: Task success rate comparison: Single Agent (solid) vs. Dual Agent (hatched). Strong models degrade significantly in DA mode; weaker models show minimal change.
  • Figure 5: Per-task API F1 scores (DA architecture). Warm colors indicate higher code-level agreement with gold standards. The task-dependent variance highlights the limitation of aggregate metrics.
  • ...and 7 more figures