Table of Contents
Fetching ...

ShapefileGPT: A Multi-Agent Large Language Model Framework for Automated Shapefile Processing

Qingming Lin, Rui Hu, Huaxia Li, Sensen Wu, Yadong Li, Kai Fang, Hailin Feng, Zhenhong Du, Liuchang Xu

TL;DR

This work proposes ShapefileGPT, an innovative LLM-powered framework for automating Shapefile processing in spatial analysis, which effectively handles complex vector data analysis, demonstrating superior spatial data understanding and analytical capabilities.

Abstract

Vector data is one of the two core data structures in geographic information science (GIS), essential for accurately storing and representing geospatial information. Shapefile, the most widely used vector data format, has become the industry standard supported by all major geographic information systems. However, processing this data typically requires specialized GIS knowledge and skills, creating a barrier for researchers from other fields and impeding interdisciplinary research in spatial data analysis. Moreover, while large language models (LLMs) have made significant advancements in natural language processing and task automation, they still face challenges in handling the complex spatial and topological relationships inherent in GIS vector data. To address these challenges, we propose ShapefileGPT, an innovative framework powered by LLMs, specifically designed to automate Shapefile tasks. ShapefileGPT utilizes a multi-agent architecture, in which the planner agent is responsible for task decomposition and supervision, while the worker agent executes the tasks. We developed a specialized function library for handling Shapefiles and provided comprehensive API documentation, enabling the worker agent to operate Shapefiles efficiently through function calling. For evaluation, we developed a benchmark dataset based on authoritative textbooks, encompassing tasks in categories such as geometric operations and spatial queries. ShapefileGPT achieved a task success rate of 95.24%, outperforming the GPT series models. In comparison to traditional LLMs, ShapefileGPT effectively handles complex vector data analysis tasks, overcoming the limitations of traditional LLMs in spatial analysis. This breakthrough opens new pathways for advancing automation and intelligence in the GIS field, with significant potential in interdisciplinary data analysis and application contexts.

ShapefileGPT: A Multi-Agent Large Language Model Framework for Automated Shapefile Processing

TL;DR

This work proposes ShapefileGPT, an innovative LLM-powered framework for automating Shapefile processing in spatial analysis, which effectively handles complex vector data analysis, demonstrating superior spatial data understanding and analytical capabilities.

Abstract

Vector data is one of the two core data structures in geographic information science (GIS), essential for accurately storing and representing geospatial information. Shapefile, the most widely used vector data format, has become the industry standard supported by all major geographic information systems. However, processing this data typically requires specialized GIS knowledge and skills, creating a barrier for researchers from other fields and impeding interdisciplinary research in spatial data analysis. Moreover, while large language models (LLMs) have made significant advancements in natural language processing and task automation, they still face challenges in handling the complex spatial and topological relationships inherent in GIS vector data. To address these challenges, we propose ShapefileGPT, an innovative framework powered by LLMs, specifically designed to automate Shapefile tasks. ShapefileGPT utilizes a multi-agent architecture, in which the planner agent is responsible for task decomposition and supervision, while the worker agent executes the tasks. We developed a specialized function library for handling Shapefiles and provided comprehensive API documentation, enabling the worker agent to operate Shapefiles efficiently through function calling. For evaluation, we developed a benchmark dataset based on authoritative textbooks, encompassing tasks in categories such as geometric operations and spatial queries. ShapefileGPT achieved a task success rate of 95.24%, outperforming the GPT series models. In comparison to traditional LLMs, ShapefileGPT effectively handles complex vector data analysis tasks, overcoming the limitations of traditional LLMs in spatial analysis. This breakthrough opens new pathways for advancing automation and intelligence in the GIS field, with significant potential in interdisciplinary data analysis and application contexts.

Paper Structure

This paper contains 20 sections, 14 figures, 6 tables.

Figures (14)

  • Figure 1: ShapefileGPT consists of a planner agent and a worker agent. The planner agent interprets user queries and decomposes them into subtasks, while the worker agent executes these subtasks by selecting appropriate functions from a predefined function library to perform Shapefile-related operations.
  • Figure 2: The design of the ShapefileGPT function library documentation. The YAML format (left) is concise and used for LLM context guidance, while the JSON format (right) is structured for external system interaction and result validation.
  • Figure 3: The planning loop in ShapefileGPT's multi-agent framework. The planner agent continuously monitors task progress, formulates new subtasks based on real-time conditions, and updates the task environment to ensure seamless task execution.
  • Figure 4: The worker loop in ShapefileGPT's multi-agent framework. The worker agent selects the appropriate API from a specialized function library to execute subtasks, ensuring precise handling of Shapefile data.
  • Figure 5: Example of a task from the Shapefile task dataset. This example demonstrates a geometric operation task, which involves creating minimum bounding rectangles for point groups in a Shapefile.
  • ...and 9 more figures