Recov-Vision: Linking Street View Imagery and Vision-Language Models for Post-Disaster Recovery

Yiming Xiao; Archit Gupta; Miguel Esparza; Yu-Hsuan Ho; Antonia Sebastian; Hannah Weas; Rose Houck; Ali Mostafavi

Recov-Vision: Linking Street View Imagery and Vision-Language Models for Post-Disaster Recovery

Yiming Xiao, Archit Gupta, Miguel Esparza, Yu-Hsuan Ho, Antonia Sebastian, Hannah Weas, Rose Houck, Ali Mostafavi

TL;DR

The paper tackles the challenge of building-level occupancy assessment after disasters by fusing street-view imagery with vision-language models to produce auditable parcel-level recovery trajectories. It introduces a street-level, language-guided pipeline (FacadeTrack/Recov-Vision) that rectifies facade views, extracts interpretable attributes, and supports two decision strategies: a transparent one-stage rule and a conservative two-stage reasoning design. Through field data from two Hurricane Helene campaigns, the two-stage approach yields higher recall and comparable overall agreement, reproducing ground-truth net recovery while exposing error pockets for QA. Spatial diagnostics reveal clustered residual errors, enabling targeted human review and scalable geospatial integration for emergency-management workflows.

Abstract

Building-level occupancy after disasters is vital for triage, inspections, utility re-energization, and equitable resource allocation. Overhead imagery provides rapid coverage but often misses facade and access cues that determine habitability, while street-view imagery captures those details but is sparse and difficult to align with parcels. We present FacadeTrack, a street-level, language-guided framework that links panoramic video to parcels, rectifies views to facades, and elicits interpretable attributes (for example, entry blockage, temporary coverings, localized debris) that drive two decision strategies: a transparent one-stage rule and a two-stage design that separates perception from conservative reasoning. Evaluated across two post-Hurricane Helene surveys, the two-stage approach achieves a precision of 0.927, a recall of 0.781, and an F-1 score of 0.848, compared with the one-stage baseline at a precision of 0.943, a recall of 0.728, and an F-1 score of 0.822. Beyond accuracy, intermediate attributes and spatial diagnostics reveal where and why residual errors occur, enabling targeted quality control. The pipeline provides auditable, scalable occupancy assessments suitable for integration into geospatial and emergency-management workflows.

Recov-Vision: Linking Street View Imagery and Vision-Language Models for Post-Disaster Recovery

TL;DR

Abstract

Recov-Vision: Linking Street View Imagery and Vision-Language Models for Post-Disaster Recovery

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)