Table of Contents
Fetching ...

Bayesian Modeling of Zero-Shot Classifications for Urban Flood Detection

Matt Franchi, Nikhil Garg, Wendy Ju, Emma Pierson

TL;DR

BayFlood presents a two-stage framework that leverages zero-shot vision-language models to detect urban flooding in unlabeled street images and a subsequent Bayesian spatial model to quantify uncertainty, smooth across space, and incorporate external flood-risk covariates. The approach is validated on over 1 million dashcam images from NYC storms and supplemented with external data such as 311 reports, FloodNet sensors, stormwater maps, DEM, and ACS demographics. Results show strong zero-shot flood signals, improved out-of-sample predictions, and robust performance with few ground-truth annotations, with inferred flood risk correlating with known risk markers and revealing biases in public reporting. Applied to New York City, BayFlood identifies flood-prone areas overlooked by existing methods and suggests sensor-placement strategies, illustrating practical impact for urban resiliency and policy. The work demonstrates a general paradigm for integrating foundation-model annotations with Bayesian inference to obtain calibrated uncertainty and actionable insights without large labeled datasets.

Abstract

Street scene datasets, collected from Street View or dashboard cameras, offer a promising means of detecting urban objects and incidents like street flooding. However, a major challenge in using these datasets is their lack of reliable labels: there are myriad types of incidents, many types occur rarely, and ground-truth measures of where incidents occur are lacking. Here, we propose BayFlood, a two-stage approach which circumvents this difficulty. First, we perform zero-shot classification of where incidents occur using a pretrained vision-language model (VLM). Second, we fit a spatial Bayesian model on the VLM classifications. The zero-shot approach avoids the need to annotate large training sets, and the Bayesian model provides frequent desiderata in urban settings - principled measures of uncertainty, smoothing across locations, and incorporation of external data like stormwater accumulation zones. We comprehensively validate this two-stage approach, showing that VLMs provide strong zero-shot signal for floods across multiple cities and time periods, the Bayesian model improves out-of-sample prediction relative to baseline methods, and our inferred flood risk correlates with known external predictors of risk. Having validated our approach, we show it can be used to improve urban flood detection: our analysis reveals 113,738 people who are at high risk of flooding overlooked by current methods, identifies demographic biases in existing methods, and suggests locations for new flood sensors. More broadly, our results showcase how Bayesian modeling of zero-shot LM annotations represents a promising paradigm because it avoids the need to collect large labeled datasets and leverages the power of foundation models while providing the expressiveness and uncertainty quantification of Bayesian models.

Bayesian Modeling of Zero-Shot Classifications for Urban Flood Detection

TL;DR

BayFlood presents a two-stage framework that leverages zero-shot vision-language models to detect urban flooding in unlabeled street images and a subsequent Bayesian spatial model to quantify uncertainty, smooth across space, and incorporate external flood-risk covariates. The approach is validated on over 1 million dashcam images from NYC storms and supplemented with external data such as 311 reports, FloodNet sensors, stormwater maps, DEM, and ACS demographics. Results show strong zero-shot flood signals, improved out-of-sample predictions, and robust performance with few ground-truth annotations, with inferred flood risk correlating with known risk markers and revealing biases in public reporting. Applied to New York City, BayFlood identifies flood-prone areas overlooked by existing methods and suggests sensor-placement strategies, illustrating practical impact for urban resiliency and policy. The work demonstrates a general paradigm for integrating foundation-model annotations with Bayesian inference to obtain calibrated uncertainty and actionable insights without large labeled datasets.

Abstract

Street scene datasets, collected from Street View or dashboard cameras, offer a promising means of detecting urban objects and incidents like street flooding. However, a major challenge in using these datasets is their lack of reliable labels: there are myriad types of incidents, many types occur rarely, and ground-truth measures of where incidents occur are lacking. Here, we propose BayFlood, a two-stage approach which circumvents this difficulty. First, we perform zero-shot classification of where incidents occur using a pretrained vision-language model (VLM). Second, we fit a spatial Bayesian model on the VLM classifications. The zero-shot approach avoids the need to annotate large training sets, and the Bayesian model provides frequent desiderata in urban settings - principled measures of uncertainty, smoothing across locations, and incorporation of external data like stormwater accumulation zones. We comprehensively validate this two-stage approach, showing that VLMs provide strong zero-shot signal for floods across multiple cities and time periods, the Bayesian model improves out-of-sample prediction relative to baseline methods, and our inferred flood risk correlates with known external predictors of risk. Having validated our approach, we show it can be used to improve urban flood detection: our analysis reveals 113,738 people who are at high risk of flooding overlooked by current methods, identifies demographic biases in existing methods, and suggests locations for new flood sensors. More broadly, our results showcase how Bayesian modeling of zero-shot LM annotations represents a promising paradigm because it avoids the need to collect large labeled datasets and leverages the power of foundation models while providing the expressiveness and uncertainty quantification of Bayesian models.

Paper Structure

This paper contains 52 sections, 8 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Spatial distribution of dashcam images in our primary analysis dataset in New York City. Most Census tracts have one hundred to five hundred images.
  • Figure 2: Representative true and false positive flood classifications from the VLM. False negatives are extremely rare due to the low prevalence of flooding in the dataset.
  • Figure 3: BayFlood can identify locations at risk for flooding which are missed by three currently used methods. (a) Census tracts with high BayFlood risk, but no 311 flooding reports; (b) tracts with high BayFlood risk but no FloodNet sensors; (c) tracts with high BayFlood risk but no predicted stormwater accumulation; (d) tracts with high BayFlood risk and no signal from any of the three existing methods.
  • Figure 4: Demographic coefficients for the risk-adjusted regression reveal biases in 311 reporting patterns. 95% confidence intervals are plotted; all demographic features are z-scored, so coefficients are in units of standard deviations of each feature.
  • Figure 5: Existing FloodNet sensor locations (black diamonds), and suggested locations for new sensors (green crosshairs).
  • ...and 5 more figures