Table of Contents
Fetching ...

Recov-Vision: Linking Street View Imagery and Vision-Language Models for Post-Disaster Recovery

Yiming Xiao, Archit Gupta, Miguel Esparza, Yu-Hsuan Ho, Antonia Sebastian, Hannah Weas, Rose Houck, Ali Mostafavi

TL;DR

The paper tackles the challenge of building-level occupancy assessment after disasters by fusing street-view imagery with vision-language models to produce auditable parcel-level recovery trajectories. It introduces a street-level, language-guided pipeline (FacadeTrack/Recov-Vision) that rectifies facade views, extracts interpretable attributes, and supports two decision strategies: a transparent one-stage rule and a conservative two-stage reasoning design. Through field data from two Hurricane Helene campaigns, the two-stage approach yields higher recall and comparable overall agreement, reproducing ground-truth net recovery while exposing error pockets for QA. Spatial diagnostics reveal clustered residual errors, enabling targeted human review and scalable geospatial integration for emergency-management workflows.

Abstract

Building-level occupancy after disasters is vital for triage, inspections, utility re-energization, and equitable resource allocation. Overhead imagery provides rapid coverage but often misses facade and access cues that determine habitability, while street-view imagery captures those details but is sparse and difficult to align with parcels. We present FacadeTrack, a street-level, language-guided framework that links panoramic video to parcels, rectifies views to facades, and elicits interpretable attributes (for example, entry blockage, temporary coverings, localized debris) that drive two decision strategies: a transparent one-stage rule and a two-stage design that separates perception from conservative reasoning. Evaluated across two post-Hurricane Helene surveys, the two-stage approach achieves a precision of 0.927, a recall of 0.781, and an F-1 score of 0.848, compared with the one-stage baseline at a precision of 0.943, a recall of 0.728, and an F-1 score of 0.822. Beyond accuracy, intermediate attributes and spatial diagnostics reveal where and why residual errors occur, enabling targeted quality control. The pipeline provides auditable, scalable occupancy assessments suitable for integration into geospatial and emergency-management workflows.

Recov-Vision: Linking Street View Imagery and Vision-Language Models for Post-Disaster Recovery

TL;DR

The paper tackles the challenge of building-level occupancy assessment after disasters by fusing street-view imagery with vision-language models to produce auditable parcel-level recovery trajectories. It introduces a street-level, language-guided pipeline (FacadeTrack/Recov-Vision) that rectifies facade views, extracts interpretable attributes, and supports two decision strategies: a transparent one-stage rule and a conservative two-stage reasoning design. Through field data from two Hurricane Helene campaigns, the two-stage approach yields higher recall and comparable overall agreement, reproducing ground-truth net recovery while exposing error pockets for QA. Spatial diagnostics reveal clustered residual errors, enabling targeted human review and scalable geospatial integration for emergency-management workflows.

Abstract

Building-level occupancy after disasters is vital for triage, inspections, utility re-energization, and equitable resource allocation. Overhead imagery provides rapid coverage but often misses facade and access cues that determine habitability, while street-view imagery captures those details but is sparse and difficult to align with parcels. We present FacadeTrack, a street-level, language-guided framework that links panoramic video to parcels, rectifies views to facades, and elicits interpretable attributes (for example, entry blockage, temporary coverings, localized debris) that drive two decision strategies: a transparent one-stage rule and a two-stage design that separates perception from conservative reasoning. Evaluated across two post-Hurricane Helene surveys, the two-stage approach achieves a precision of 0.927, a recall of 0.781, and an F-1 score of 0.848, compared with the one-stage baseline at a precision of 0.943, a recall of 0.728, and an F-1 score of 0.822. Beyond accuracy, intermediate attributes and spatial diagnostics reveal where and why residual errors occur, enabling targeted quality control. The pipeline provides auditable, scalable occupancy assessments suitable for integration into geospatial and emergency-management workflows.

Paper Structure

This paper contains 24 sections, 3 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Study area in broader Asheville area, North Carolina, showing residential parcels and drive routes used for panoramic data collection. Dots indicate buildings and lines indicate drive routes. Orange indicates routes/buildings from Visit 1, Green Visit 2, and Purple from both.
  • Figure 2: Visualization of the view rectification process. (a) The top-down view illustrates the geometric parameters used for rectification, including the vehicle's position $(x_1, y_1)$ and heading $(\vec{\theta}_1)$, and the building's centroid $(x_2, y_2)$. The bearing to the building $(\vec{\theta}_2)$ is calculated, and the yaw angle $\alpha$ orients the rectified field of view (green) toward the building. (b) A raw 360-degree panoramic image is processed to generate a rectified, planar view of the building facade, which is then used for analysis.
  • Figure 3: Overview of the two prompting strategies for building occupancy classification. (a) Single-stage baseline: a vision-language model (VLM) extracts nine visual attributes, and a deterministic scoring rule with threshold $\tau$ produces the final label. (b) Two-stage strategy: the same VLM extracts the attributes; a text-only reasoning LLM applies explicit rules and few-shot exemplars to generate a conservative final decision.
  • Figure 4: Occupancy status overview for Ground Truth, Two-stage predictions, and One-stage predictions across both visits.
  • Figure 5: Confusion matrices for both visits, comparing model predictions to ground truth. Not Occupied is treated as a positive class.
  • ...and 5 more figures