MultiFloodSynth: Multi-Annotated Flood Synthetic Dataset Generation
YoonJe Kang, Yonghoon Jung, Wonseop Shin, Bumsoo Kim, Sanghyun Seo
TL;DR
This work tackles data scarcity in flood hazard detection by introducing MultiFloodSynth, a parameter-controllable synthetic pipeline that uses a 3D urban engine and image-to-3D tools to generate a $5$-level flood dataset with $9$ annotation types. The dataset comprises $70{,}117$ images and achieves a Realistic Score of $93.17\%$ relative to real data, indicating strong realism. Experiments show that combining real and synthetic data yields the best detection performance, reducing labeling and data collection burdens while supporting multiple CV tasks through rich annotations. The approach offers a practical, scalable resource for urban flood hazard systems with implications for improved hazard detection and interpretability via XAI techniques like EigenCAM.
Abstract
In this paper, we present synthetic data generation framework for flood hazard detection system. For high fidelity and quality, we characterize several real-world properties into virtual world and simulate the flood situation by controlling them. For the sake of efficiency, recent generative models in image-to-3D and urban city synthesis are leveraged to easily composite flood environments so that we avoid data bias due to the hand-crafted manner. Based on our framework, we build the flood synthetic dataset with 5 levels, dubbed MultiFloodSynth which contains rich annotation types like normal map, segmentation, 3D bounding box for a variety of downstream task. In experiments, our dataset demonstrate the enhanced performance of flood hazard detection with on-par realism compared with real dataset.
