AIFloodSense: A Global Aerial Imagery Dataset for Semantic Segmentation and Understanding of Flooded Environments
Georgios Simantiris, Konstantinos Bacharidis, Apostolos Papanikolaou, Petros Giannakakis, Costas Panagiotakis
TL;DR
Flood detection from visual data is hampered by a lack of large, diverse, geo-spatially rich datasets. AIFloodSense introduces a global UAV-based benchmark (470 images, 230 flood events, 64 countries, six continents) with pixel-level masks for Flood, Sky, and Building plus continent labels, enabling image classification, semantic segmentation, and VQA. Baseline experiments across CNNs and Transformers reveal the dataset's difficulty and its value for domain-generalized, multi-task flood analysis, including cross-continental transfer and explainable reasoning. The resource aims to accelerate robust, interpretable flood understanding and disaster response through open access and standardized benchmarks.
Abstract
Accurate flood detection from visual data is a critical step toward improving disaster response and risk assessment, yet datasets for flood segmentation remain scarce due to the challenges of collecting and annotating large-scale imagery. Existing resources are often limited in geographic scope and annotation detail, hindering the development of robust, generalized computer vision methods. To bridge this gap, we introduce AIFloodSense, a comprehensive, publicly available aerial imagery dataset comprising 470 high-resolution images from 230 distinct flood events across 64 countries and six continents. Unlike prior benchmarks, AIFloodSense ensures global diversity and temporal relevance (2022-2024), supporting three complementary tasks: (i) Image Classification with novel sub-tasks for environment type, camera angle, and continent recognition; (ii) Semantic Segmentation providing precise pixel-level masks for flood, sky, and buildings; and (iii) Visual Question Answering (VQA) to enable natural language reasoning for disaster assessment. We establish baseline benchmarks for all tasks using state-of-the-art architectures, demonstrating the dataset's complexity and its value in advancing domain-generalized AI tools for climate resilience.
