Table of Contents
Fetching ...

Assessment of a new GeoAI foundation model for flood inundation mapping

Wenwen Li, Hyunho Lee, Sizhe Wang, Chia-Yu Hsu, Samantha T. Arundel

TL;DR

The paper assesses IBM-NASA's geospatial foundation model Prithvi for flood inundation mapping by comparing it against U-Net and Segformer on the Sen1Floods11 benchmark. Prithvi demonstrates notable transferability to unseen regions, outperforming baselines in the Bolivia dataset, while U-Net and Segformer show stronger in-region performance due to multi-scale features. Limitations include Prithvi's single-level feature extraction, its requirement for six specific spectral bands, and the lack of an end-to-end segmentation head. The work highlights the potential of geospatial foundation models for GeoAI, while outlining concrete directions for multi-scale representation, end-to-end pipelines, and broader input-band support to enable wider adoption.

Abstract

Vision foundation models are a new frontier in Geospatial Artificial Intelligence (GeoAI), an interdisciplinary research area that applies and extends AI for geospatial problem solving and geographic knowledge discovery, because of their potential to enable powerful image analysis by learning and extracting important image features from vast amounts of geospatial data. This paper evaluates the performance of the first-of-its-kind geospatial foundation model, IBM-NASA's Prithvi, to support a crucial geospatial analysis task: flood inundation mapping. This model is compared with convolutional neural network and vision transformer-based architectures in terms of mapping accuracy for flooded areas. A benchmark dataset, Sen1Floods11, is used in the experiments, and the models' predictability, generalizability, and transferability are evaluated based on both a test dataset and a dataset that is completely unseen by the model. Results show the good transferability of the Prithvi model, highlighting its performance advantages in segmenting flooded areas in previously unseen regions. The findings also indicate areas for improvement for the Prithvi model in terms of adopting multi-scale representation learning, developing more end-to-end pipelines for high-level image analysis tasks, and offering more flexibility in terms of input data bands.

Assessment of a new GeoAI foundation model for flood inundation mapping

TL;DR

The paper assesses IBM-NASA's geospatial foundation model Prithvi for flood inundation mapping by comparing it against U-Net and Segformer on the Sen1Floods11 benchmark. Prithvi demonstrates notable transferability to unseen regions, outperforming baselines in the Bolivia dataset, while U-Net and Segformer show stronger in-region performance due to multi-scale features. Limitations include Prithvi's single-level feature extraction, its requirement for six specific spectral bands, and the lack of an end-to-end segmentation head. The work highlights the potential of geospatial foundation models for GeoAI, while outlining concrete directions for multi-scale representation, end-to-end pipelines, and broader input-band support to enable wider adoption.

Abstract

Vision foundation models are a new frontier in Geospatial Artificial Intelligence (GeoAI), an interdisciplinary research area that applies and extends AI for geospatial problem solving and geographic knowledge discovery, because of their potential to enable powerful image analysis by learning and extracting important image features from vast amounts of geospatial data. This paper evaluates the performance of the first-of-its-kind geospatial foundation model, IBM-NASA's Prithvi, to support a crucial geospatial analysis task: flood inundation mapping. This model is compared with convolutional neural network and vision transformer-based architectures in terms of mapping accuracy for flooded areas. A benchmark dataset, Sen1Floods11, is used in the experiments, and the models' predictability, generalizability, and transferability are evaluated based on both a test dataset and a dataset that is completely unseen by the model. Results show the good transferability of the Prithvi model, highlighting its performance advantages in segmenting flooded areas in previously unseen regions. The findings also indicate areas for improvement for the Prithvi model in terms of adopting multi-scale representation learning, developing more end-to-end pipelines for high-level image analysis tasks, and offering more flexibility in terms of input data bands.
Paper Structure (11 sections, 4 equations, 4 figures, 3 tables)

This paper contains 11 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The architecture of the geospatial foundation model Prithvi, tailored for semantic segmentation.
  • Figure 2: Architecture of U-Net. H: height. W: width. C: channel.
  • Figure 3: Model architecture of Segformer (Adapted from xie2021segformer). H: height. W: width. C: channel.
  • Figure 4: Visual comparison of prediction results. The images in rows (a) and (b) are from the test dataset, and the image in row (c) is from the unseen Bolivia dataset. "Label" indicates ground-truth labels. White: flood; Black: non-flood; Gray: no data. S2-FCC denotes Sentinel-2 False Color Composite. The red boxes highlight the comparative regions in terms of model segmentation results.