CrisiSense-RAG: Crisis Sensing Multimodal Retrieval-Augmented Generation for Rapid Disaster Impact Assessment
Yiming Xiao, Kai Yin, Ali Mostafavi
TL;DR
CrisiSense-RAG addresses the challenge of rapid, spatially resolved disaster impact assessment amid temporally asynchronous data by introducing a split-pipeline multimodal retrieval-augmented generation framework. By separating Text Analyst and Visual Analyst reasoning and employing asynchronous fusion, it prioritizes real-time social reports for flood extent while treating post-event imagery as persistent evidence of damage, all under metric-aligned generation. Zero-shot evaluation on Hurricane Harvey across three foundation-model backends shows competitive flood-extent and damage predictions (e.g., Extent MAE from $10.94\%$ to $28.40\%$, Damage MAE from $16.47\%$ to $21.65\%$), with prompt engineering contributing up to $4.75$ percentage points improvement. The work demonstrates that general-purpose pretrained models can deliver practical, auditable resilience intelligence without event-specific fine-tuning, offering a deployable pathway for emergency management under real-world data constraints, while also outlining limitations and directions for future multi-hazard extensions and uncertainty quantification.
Abstract
Timely and spatially resolved disaster impact assessment is essential for effective emergency response. However, automated methods typically struggle with temporal asynchrony. Real-time human reports capture peak hazard conditions while high-resolution satellite imagery is frequently acquired after peak conditions. This often reflects flood recession rather than maximum extent. Naive fusion of these misaligned streams can yield dangerous underestimates when post-event imagery overrides documented peak flooding. We present CrisiSense-RAG, which is a multimodal retrieval-augmented generation framework that reframes impact assessment as evidence synthesis over heterogeneous data sources without disaster-specific fine-tuning. The system employs hybrid dense-sparse retrieval for text sources and CLIP-based retrieval for aerial imagery. A split-pipeline architecture feeds into asynchronous fusion logic that prioritizes real-time social evidence for peak flood extent while treating imagery as persistent evidence of structural damage. Evaluated on Hurricane Harvey across 207 ZIP-code queries, the framework achieves a flood extent MAE of 10.94% to 28.40% and damage severity MAE of 16.47% to 21.65% in zero-shot settings. Prompt-level alignment proves critical for quantitative validity because metric grounding improves damage estimates by up to 4.75 percentage points. These results demonstrate a practical and deployable approach to rapid resilience intelligence under real-world data constraints.
