Table of Contents
Fetching ...

Evaluation of Performance Measures for Qualifying Flood Models with Satellite Observations

Jean-Paul Travert, Sébastien Boyaval, Cédric Goeury, Vito Bacchi, Fabrice Zaoui

TL;DR

This paper addresses how to select performance measures for comparing 2D flood-model simulations with satellite flood maps, introducing a metaverification approach with four criteria (magnitude sensitivity, displacement sensitivity, noise sensitivity, and computation time). Using a Garonne River case (Feb 2021) and a 92-zone friction parameterization within a 2D shallow-water model solved by TELEMAC-2D, the authors evaluate 28 candidate measures spanning pixel-to-pixel and geometric similarities, then rank and correlate them to identify robust metrics. The methodology identifies eight surviving measures and highlights four top performers ($\kappa$, $MCC$, $NMI$, $d_{MH}$) that prove most robust to observation errors, with a practical calibration example using $NMI$. The study provides a transferable framework for choosing flood-map metrics applicable to other flood events and models, informing calibration, validation, and data-assimilation workflows under observation-imposed uncertainties.

Abstract

This work discusses how to choose performance measures to compare numerical simulations of a flood event with one satellite image, e.g., in a model calibration or validation procedure. A series of criterion are proposed to evaluate the sensitivity of performance measures with respect to the flood extent, satellite characteristics (position, orientation), and measurements/processing errors (satellite raw values or extraction of the flood maps). Their relevance is discussed numerically in the case of one flooding event (on the Garonne River in France in February 2021), using a distribution of water depths simulated from a shallow-water model parameterized by an uncertain friction field. After identifying the performance measures respecting the most criteria, a correlation analysis is carried out to identify how various performance measures are similar. Then, a methodology is proposed to rank performance measures and select the most robust to observation errors. The methodology is shown useful at identifying four performance measures out of 28 in the study case. Note that the various top-ranked performance measures do not lead to the same calibration result as regards the friction field of the shallow-water model. The methodology can be applied to the comparison of any flood model with any flood event.

Evaluation of Performance Measures for Qualifying Flood Models with Satellite Observations

TL;DR

This paper addresses how to select performance measures for comparing 2D flood-model simulations with satellite flood maps, introducing a metaverification approach with four criteria (magnitude sensitivity, displacement sensitivity, noise sensitivity, and computation time). Using a Garonne River case (Feb 2021) and a 92-zone friction parameterization within a 2D shallow-water model solved by TELEMAC-2D, the authors evaluate 28 candidate measures spanning pixel-to-pixel and geometric similarities, then rank and correlate them to identify robust metrics. The methodology identifies eight surviving measures and highlights four top performers (, , , ) that prove most robust to observation errors, with a practical calibration example using . The study provides a transferable framework for choosing flood-map metrics applicable to other flood events and models, informing calibration, validation, and data-assimilation workflows under observation-imposed uncertainties.

Abstract

This work discusses how to choose performance measures to compare numerical simulations of a flood event with one satellite image, e.g., in a model calibration or validation procedure. A series of criterion are proposed to evaluate the sensitivity of performance measures with respect to the flood extent, satellite characteristics (position, orientation), and measurements/processing errors (satellite raw values or extraction of the flood maps). Their relevance is discussed numerically in the case of one flooding event (on the Garonne River in France in February 2021), using a distribution of water depths simulated from a shallow-water model parameterized by an uncertain friction field. After identifying the performance measures respecting the most criteria, a correlation analysis is carried out to identify how various performance measures are similar. Then, a methodology is proposed to rank performance measures and select the most robust to observation errors. The methodology is shown useful at identifying four performance measures out of 28 in the study case. Note that the various top-ranked performance measures do not lead to the same calibration result as regards the friction field of the shallow-water model. The methodology can be applied to the comparison of any flood model with any flood event.
Paper Structure (26 sections, 19 equations, 17 figures, 7 tables)

This paper contains 26 sections, 19 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: Visualization of the study area on the Garonne River in France.
  • Figure 2: Measured discharge at the gauging stations (Tonneins, Marmande, and La Réole) and Sentinel-1 acquisition times during February 2021 flood event.
  • Figure 3: Garonne study area and unstructured triangular mesh. The red dots are in-situ gauging stations and the colored subdomains represent the floodplain friction zones.
  • Figure 4: Example of flood maps (on the Garonne River) for two thresholds: $h_{20}$ (left), and $h_{50}$ (right).
  • Figure 5: Box plots of $M$ samples $\{\mathcal{L}(S^{h_{50}}(\omega_{m}),S^{h_{l}}(\omega_{m}))\}_{1\leq m \leq M}$ functions of $l\in \{10; ...; 90\}$ for $\mathcal{L}\in \{FPR; ...; FSS_{7}\}$.
  • ...and 12 more figures