Comparative Evaluation of Prompting and Fine-Tuning for Applying Large Language Models to Grid-Structured Geospatial Data
Akash Dhruv, Yangxinyu Xie, Jordan Branham, Tanwi Mallick
TL;DR
This study addresses how large language models can interpret grid-structured geospatial data used in weather resilience planning by comparing zero-shot prompting to fine-tuning on a ClimRR-inspired dataset. It demonstrates that a small open-weight model (LLaMA $3.1$ with $8\mathrm{B}$ parameters) can perform basic value extraction and scenario comparisons with prompting, but suffers from numeric and unit inconsistencies. Fine-tuning with LoRA on ~100 task-specified examples substantially boosts both semantic alignment ($0.8954$) and exact-value accuracy ($1.0$), underscoring the importance of domain-specific adaptation for precise spatiotemporal reasoning. The results support practical, resource-efficient agentic workflows for real-time, region-aware weather interpretation, while outlining avenues to broaden datasets and integrate real-time data streams via ClimRR APIs.
Abstract
This paper presents a comparative study of large language models (LLMs) in interpreting grid-structured geospatial data. We evaluate the performance of a base model through structured prompting and contrast it with a fine-tuned variant trained on a dataset of user-assistant interactions. Our results highlight the strengths and limitations of zero-shot prompting and demonstrate the benefits of fine-tuning for structured geospatial and temporal reasoning.
