Table of Contents
Fetching ...

Comparative Evaluation of Prompting and Fine-Tuning for Applying Large Language Models to Grid-Structured Geospatial Data

Akash Dhruv, Yangxinyu Xie, Jordan Branham, Tanwi Mallick

TL;DR

This study addresses how large language models can interpret grid-structured geospatial data used in weather resilience planning by comparing zero-shot prompting to fine-tuning on a ClimRR-inspired dataset. It demonstrates that a small open-weight model (LLaMA $3.1$ with $8\mathrm{B}$ parameters) can perform basic value extraction and scenario comparisons with prompting, but suffers from numeric and unit inconsistencies. Fine-tuning with LoRA on ~100 task-specified examples substantially boosts both semantic alignment ($0.8954$) and exact-value accuracy ($1.0$), underscoring the importance of domain-specific adaptation for precise spatiotemporal reasoning. The results support practical, resource-efficient agentic workflows for real-time, region-aware weather interpretation, while outlining avenues to broaden datasets and integrate real-time data streams via ClimRR APIs.

Abstract

This paper presents a comparative study of large language models (LLMs) in interpreting grid-structured geospatial data. We evaluate the performance of a base model through structured prompting and contrast it with a fine-tuned variant trained on a dataset of user-assistant interactions. Our results highlight the strengths and limitations of zero-shot prompting and demonstrate the benefits of fine-tuning for structured geospatial and temporal reasoning.

Comparative Evaluation of Prompting and Fine-Tuning for Applying Large Language Models to Grid-Structured Geospatial Data

TL;DR

This study addresses how large language models can interpret grid-structured geospatial data used in weather resilience planning by comparing zero-shot prompting to fine-tuning on a ClimRR-inspired dataset. It demonstrates that a small open-weight model (LLaMA with parameters) can perform basic value extraction and scenario comparisons with prompting, but suffers from numeric and unit inconsistencies. Fine-tuning with LoRA on ~100 task-specified examples substantially boosts both semantic alignment () and exact-value accuracy (), underscoring the importance of domain-specific adaptation for precise spatiotemporal reasoning. The results support practical, resource-efficient agentic workflows for real-time, region-aware weather interpretation, while outlining avenues to broaden datasets and integrate real-time data streams via ClimRR APIs.

Abstract

This paper presents a comparative study of large language models (LLMs) in interpreting grid-structured geospatial data. We evaluate the performance of a base model through structured prompting and contrast it with a fine-tuned variant trained on a dataset of user-assistant interactions. Our results highlight the strengths and limitations of zero-shot prompting and demonstrate the benefits of fine-tuning for structured geospatial and temporal reasoning.

Paper Structure

This paper contains 7 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Schematic showing the mapping of gridded ClimRR data over United States. Each grid cell is assigned an alphanumeric tag (e.g., R073C493) and contains atmospheric variable values in tabular form. These values can be transformed into a user–input–assistant format suitable for prompting and fine-tuning the language model.
  • Figure 2: Comparison of reference outputs with responses from the base and fine-tuned models to user queries, highlighting differences (shown in red) in accuracy and reasoning. These examples illustrate common ambiguities in base model responses to geospatial climate queries, including challenges in referencing correct RCP scenarios, handling measurement units, and making accurate regional comparisons. In contrast, fine-tuned models show improved alignment with reference answers across all categories, demonstrating an enhanced understanding of domain-specific nuances.