Comparative Evaluation of Prompting and Fine-Tuning for Applying Large Language Models to Grid-Structured Geospatial Data

Akash Dhruv; Yangxinyu Xie; Jordan Branham; Tanwi Mallick

Comparative Evaluation of Prompting and Fine-Tuning for Applying Large Language Models to Grid-Structured Geospatial Data

Akash Dhruv, Yangxinyu Xie, Jordan Branham, Tanwi Mallick

TL;DR

This study addresses how large language models can interpret grid-structured geospatial data used in weather resilience planning by comparing zero-shot prompting to fine-tuning on a ClimRR-inspired dataset. It demonstrates that a small open-weight model (LLaMA $3.1$ with $8\mathrm{B}$ parameters) can perform basic value extraction and scenario comparisons with prompting, but suffers from numeric and unit inconsistencies. Fine-tuning with LoRA on ~100 task-specified examples substantially boosts both semantic alignment ($0.8954$) and exact-value accuracy ($1.0$), underscoring the importance of domain-specific adaptation for precise spatiotemporal reasoning. The results support practical, resource-efficient agentic workflows for real-time, region-aware weather interpretation, while outlining avenues to broaden datasets and integrate real-time data streams via ClimRR APIs.

Abstract

This paper presents a comparative study of large language models (LLMs) in interpreting grid-structured geospatial data. We evaluate the performance of a base model through structured prompting and contrast it with a fine-tuned variant trained on a dataset of user-assistant interactions. Our results highlight the strengths and limitations of zero-shot prompting and demonstrate the benefits of fine-tuning for structured geospatial and temporal reasoning.

Comparative Evaluation of Prompting and Fine-Tuning for Applying Large Language Models to Grid-Structured Geospatial Data

TL;DR

with

parameters) can perform basic value extraction and scenario comparisons with prompting, but suffers from numeric and unit inconsistencies. Fine-tuning with LoRA on ~100 task-specified examples substantially boosts both semantic alignment (

) and exact-value accuracy (

), underscoring the importance of domain-specific adaptation for precise spatiotemporal reasoning. The results support practical, resource-efficient agentic workflows for real-time, region-aware weather interpretation, while outlining avenues to broaden datasets and integrate real-time data streams via ClimRR APIs.

Comparative Evaluation of Prompting and Fine-Tuning for Applying Large Language Models to Grid-Structured Geospatial Data

TL;DR

Abstract

Comparative Evaluation of Prompting and Fine-Tuning for Applying Large Language Models to Grid-Structured Geospatial Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)