Table of Contents
Fetching ...

High-Resolution Flood Probability Mapping Using Generative Machine Learning with Large-Scale Synthetic Precipitation and Inundation Data

Lipai Huang, Federico Antolini, Ali Mostafavi, Russell Blessing, Matthew Garcia, Samuel D. Brody

TL;DR

The paper tackles the challenge of producing high-resolution probabilistic flood maps in data-limited settings by introducing the Precipitation-Flood Depth Generative Pipeline, a surrogate ML framework that generates large-scale synthetic inundation data from CTGAN-generated precipitation conditioned on region-specific features. A cell-wise depth estimator (MaxFloodCast V2) trained on physics-based flood scenarios, combined with a structured sampling and smoothing pipeline, yields thousands of synthetic rainfall events and corresponding flood depths, enabling probabilistic maps across multiple depth thresholds. Key contributions include the cell-wise depth-estimator approach, a constrained CTGAN for precipitation generation, and an all-to-one event-sampling strategy that preserves nonlinearity while scaling to many events; validation shows the synthetic depth distributions closely resemble training data, and the resulting maps reveal meaningful spatial patterns of flood risk. This framework offers a scalable, region-adaptable tool for flood risk assessment and planning, with potential extensions to other regions and real-time forecasting contexts.

Abstract

High-resolution flood probability maps are instrumental for assessing flood risk but are often limited by the availability of historical data. Additionally, producing simulated data needed for creating probabilistic flood maps using physics-based models involves significant computation and time effort, which inhibit its feasibility. To address this gap, this study introduces Precipitation-Flood Depth Generative Pipeline, a novel methodology that leverages generative machine learning to generate large-scale synthetic inundation data to produce probabilistic flood maps. With a focus on Harris County, Texas, Precipitation-Flood Depth Generative Pipeline begins with training a cell-wise depth estimator using a number of precipitation-flood events model with a physics-based model. This cell-wise depth estimator, which emphasizes precipitation-based features, outperforms universal models. Subsequently, the Conditional Generative Adversarial Network (CTGAN) is used to conditionally generate synthetic precipitation point cloud, which are filtered using strategic thresholds to align with realistic precipitation patterns. Hence, a precipitation feature pool is constructed for each cell, enabling strategic sampling and the generation of synthetic precipitation events. After generating 10,000 synthetic events, flood probability maps are created for various inundation depths. Validation using similarity and correlation metrics confirms the accuracy of the synthetic depth distributions. The Precipitation-Flood Depth Generative Pipeline provides a scalable solution to generate synthetic flood depth data needed for high-resolution flood probability maps, which can enhance flood mitigation planning.

High-Resolution Flood Probability Mapping Using Generative Machine Learning with Large-Scale Synthetic Precipitation and Inundation Data

TL;DR

The paper tackles the challenge of producing high-resolution probabilistic flood maps in data-limited settings by introducing the Precipitation-Flood Depth Generative Pipeline, a surrogate ML framework that generates large-scale synthetic inundation data from CTGAN-generated precipitation conditioned on region-specific features. A cell-wise depth estimator (MaxFloodCast V2) trained on physics-based flood scenarios, combined with a structured sampling and smoothing pipeline, yields thousands of synthetic rainfall events and corresponding flood depths, enabling probabilistic maps across multiple depth thresholds. Key contributions include the cell-wise depth-estimator approach, a constrained CTGAN for precipitation generation, and an all-to-one event-sampling strategy that preserves nonlinearity while scaling to many events; validation shows the synthetic depth distributions closely resemble training data, and the resulting maps reveal meaningful spatial patterns of flood risk. This framework offers a scalable, region-adaptable tool for flood risk assessment and planning, with potential extensions to other regions and real-time forecasting contexts.

Abstract

High-resolution flood probability maps are instrumental for assessing flood risk but are often limited by the availability of historical data. Additionally, producing simulated data needed for creating probabilistic flood maps using physics-based models involves significant computation and time effort, which inhibit its feasibility. To address this gap, this study introduces Precipitation-Flood Depth Generative Pipeline, a novel methodology that leverages generative machine learning to generate large-scale synthetic inundation data to produce probabilistic flood maps. With a focus on Harris County, Texas, Precipitation-Flood Depth Generative Pipeline begins with training a cell-wise depth estimator using a number of precipitation-flood events model with a physics-based model. This cell-wise depth estimator, which emphasizes precipitation-based features, outperforms universal models. Subsequently, the Conditional Generative Adversarial Network (CTGAN) is used to conditionally generate synthetic precipitation point cloud, which are filtered using strategic thresholds to align with realistic precipitation patterns. Hence, a precipitation feature pool is constructed for each cell, enabling strategic sampling and the generation of synthetic precipitation events. After generating 10,000 synthetic events, flood probability maps are created for various inundation depths. Validation using similarity and correlation metrics confirms the accuracy of the synthetic depth distributions. The Precipitation-Flood Depth Generative Pipeline provides a scalable solution to generate synthetic flood depth data needed for high-resolution flood probability maps, which can enhance flood mitigation planning.
Paper Structure (20 sections, 9 equations, 6 figures, 4 tables)

This paper contains 20 sections, 9 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The main research workflow comprises four key steps. Step I involves training and selecting the optimal depth estimation model to generate synthetic flood depth data. The trained model is later applied in Step III for synthetic flood depth construction. Step II focuses on sampling events, preprocessing point-level features, and training a CTGAN with specific constraints to generate a synthetic point cloud through conditional sampling. In Step III, a precipitation feature-level pool is created for each cell mesh using thresholds derived from sampled training event features, enabling strategic sampling to generate synthetic precipitation event features. The trained depth estimator from Step I then processes KNN-smoothed synthetic precipitation features along with static features to produce synthetic flood depths. Finally, Step IV iterates Step III to generate thousands of synthetic precipitation events, forming a synthetic depth distribution and ultimately constructing a synthetic flood probability map.
  • Figure 2: Study area and Harris County Flood Control District gauge distribution. Map generated using ArcGIS Pro 3.0.0 (https://pro.arcgis.com/).
  • Figure 3: (I) Grid search results for CTGAN hyperparameters. The optimal hyperparameter set was selected based on the highest average marginal distribution scores across all synthetic precipitation-based features generated by the best-performing CTGAN configurations. The key hyperparameters explored in the grid search include the generator and discriminator learning rates, as well as the number of training epochs. (II) Stacked Marginal Distribution Comparison of Three Synthetic Features between training data and synthetic data: (a) cumulative precipitation, (b) peak precipitation and (c) duration. The distribution of the training dataset is represented in gray, while the synthetic distribution is depicted in light blue. Distributions were generated using the Synthetic Data Vault (SDV) SDV.
  • Figure 4: Synthetic flood event with 15 hours global precipitation. All the attribute maps are in the same scale and they share the same color bar with different units: inch, inch, hour and feet respectively. The synthetic cumulative precipitation and peak precipitation are processed by 50-NN smoother. Maps generated by Geopandas Python package.
  • Figure 5: Histograms of Flood Assessment Metrics: Comparison of metrics between channel cells and non-channel cells, based on the differences between the sampled training depth distribution and the downsampled synthetic depth distribution.
  • ...and 1 more figures