Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery
Shambhavi Shanker, Manikandan Padmanaban, Jagabondhu Hazra
TL;DR
The paper tackles the challenge of visual question answering on satellite imagery requiring geospatial reasoning for climate applications. It introduces a three-stage pipeline that combines chain-of-thought data distillation, supervised fine-tuning with rationale augmentation, and direct preference optimization to align reasoning with user expectations. Empirical results show substantial gains: 18.19% improvement from CoT supervision, an additional 5.67% from DPO, and up to 34.9% improvement with full fine-tuning, achieving 82.77% accuracy on RSVQA and improved transfer to FloodNet (67.4%). The work enhances interpretability and robustness in geospatial AI, advancing climate-use cases such as disaster monitoring and resilience planning, while highlighting remaining gaps in numerical counting and cross-dataset adaptation.
Abstract
Geospatial chain of thought (CoT) reasoning is essential for advancing Visual Question Answering (VQA) on satellite imagery, particularly in climate related applications such as disaster monitoring, infrastructure risk assessment, urban resilience planning, and policy support. Existing VQA models enable scalable interpretation of remote sensing data but often lack the structured reasoning required for complex geospatial queries. We propose a VQA framework that integrates CoT reasoning with Direct Preference Optimization (DPO) to improve interpretability, robustness, and accuracy. By generating intermediate rationales, the model better handles tasks involving detection, classification, spatial relations, and comparative analysis, which are critical for reliable decision support in high stakes climate domains. Experiments show that CoT supervision improves accuracy by 34.9\% over direct baselines, while DPO yields additional gains in accuracy and reasoning quality. The resulting system advances VQA for multispectral Earth observation by enabling richer geospatial reasoning and more effective climate use cases.
