Table of Contents
Fetching ...

GeoAI Agency Primitives

Akram Zaytar, Rohan Sawahn, Caleb Robinson, Gilles Q. Hacheme, Girmaw A. Tadesse, Inbal Becker-Reshef, Rahul Dodhia, Juan Lavista Ferres

Abstract

We present ongoing research on agency primitives for GeoAI assistants -- core capabilities that connect Foundation models to the artifact-centric, human-in-the-loop workflows where GIS practitioners actually work. Despite advances in satellite image captioning, visual question answering, and promptable segmentation, these capabilities have not translated into productivity gains for practitioners who spend most of their time producing vector layers, raster maps, and cartographic products. The gap is not model capability alone but the absence of an agency layer that supports iterative collaboration. We propose a vocabulary of $9$ primitives for such a layer -- including navigation, perception, geo-referenced memory, and dual modeling -- along with a benchmark that measures human productivity. Our goal is a vocabulary that makes agentic assistance in GIS implementable, testable, and comparable.

GeoAI Agency Primitives

Abstract

We present ongoing research on agency primitives for GeoAI assistants -- core capabilities that connect Foundation models to the artifact-centric, human-in-the-loop workflows where GIS practitioners actually work. Despite advances in satellite image captioning, visual question answering, and promptable segmentation, these capabilities have not translated into productivity gains for practitioners who spend most of their time producing vector layers, raster maps, and cartographic products. The gap is not model capability alone but the absence of an agency layer that supports iterative collaboration. We propose a vocabulary of primitives for such a layer -- including navigation, perception, geo-referenced memory, and dual modeling -- along with a benchmark that measures human productivity. Our goal is a vocabulary that makes agentic assistance in GIS implementable, testable, and comparable.

Paper Structure

This paper contains 14 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: General agent workflow. User queries are parsed, decomposed into tasks, and executed recursively. Completed tasks yield suggestions for user review before committing.
  • Figure 2: Core sensing primitives. (Top-left) Navigation constructs context bundles specifying sub-ROIs, zoom, and sampling strategy. (Top-right) Perception routes patches to task-appropriate models, returning labels and notes. (Bottom-left) GeoMemory stores spatial notes for retrieval and curation. (Bottom-right) Embeddings map inputs to vectors for similarity search and modeling.
  • Figure 3: Execution and enrichment primitives. (Top-left) Compute Graphs translate queries into directed operation graphs. (Top-right) Budgets enforce constraints enabling partial results and early stopping. (Bottom-left) Attribution enriches geometries with external data. (Bottom-right) Dual Modeling iterates between expensive VLM judgments and cheap scalable inference.