Table of Contents
Fetching ...

Robust Disaster Assessment from Aerial Imagery Using Text-to-Image Synthetic Data

Tarun Kalluri, Jihyeon Lee, Kihyuk Sohn, Sahil Singla, Manmohan Chandraker, Joseph Xu, Jeremiah Liu

TL;DR

This work tackles the problem of poor cross-domain robustness in aerial disaster assessment when post-disaster labeled data are scarce. It introduces a scalable pipeline that uses mask-guided text-to-image editing (via the MUSE model) to synthesize post-disaster imagery conditioned on target-domain pre-disaster images, paired with a simple two-stage training regime that leverages source-domain labels and synthetic target data. Empirical results on xBD and SKAI demonstrate significant improvements over source-only baselines in both single-source and multi-source transfer settings, with gains up to roughly 29% in AUPRC on challenging cross-geography transfers. The approach enables rapid, low-cost generation of target-domain supervision and yields practical robustness gains for disaster response in under-resourced geographies, while acknowledging sensitivity to the quality of generated imagery and potential benefits from more advanced filtering and domain-specific generator tuning.

Abstract

We present a simple and efficient method to leverage emerging text-to-image generative models in creating large-scale synthetic supervision for the task of damage assessment from aerial images. While significant recent advances have resulted in improved techniques for damage assessment using aerial or satellite imagery, they still suffer from poor robustness to domains where manual labeled data is unavailable, directly impacting post-disaster humanitarian assistance in such under-resourced geographies. Our contribution towards improving domain robustness in this scenario is two-fold. Firstly, we leverage the text-guided mask-based image editing capabilities of generative models and build an efficient and easily scalable pipeline to generate thousands of post-disaster images from low-resource domains. Secondly, we propose a simple two-stage training approach to train robust models while using manual supervision from different source domains along with the generated synthetic target domain data. We validate the strength of our proposed framework under cross-geography domain transfer setting from xBD and SKAI images in both single-source and multi-source settings, achieving significant improvements over a source-only baseline in each case.

Robust Disaster Assessment from Aerial Imagery Using Text-to-Image Synthetic Data

TL;DR

This work tackles the problem of poor cross-domain robustness in aerial disaster assessment when post-disaster labeled data are scarce. It introduces a scalable pipeline that uses mask-guided text-to-image editing (via the MUSE model) to synthesize post-disaster imagery conditioned on target-domain pre-disaster images, paired with a simple two-stage training regime that leverages source-domain labels and synthetic target data. Empirical results on xBD and SKAI demonstrate significant improvements over source-only baselines in both single-source and multi-source transfer settings, with gains up to roughly 29% in AUPRC on challenging cross-geography transfers. The approach enables rapid, low-cost generation of target-domain supervision and yields practical robustness gains for disaster response in under-resourced geographies, while acknowledging sensitivity to the quality of generated imagery and potential benefits from more advanced filtering and domain-specific generator tuning.

Abstract

We present a simple and efficient method to leverage emerging text-to-image generative models in creating large-scale synthetic supervision for the task of damage assessment from aerial images. While significant recent advances have resulted in improved techniques for damage assessment using aerial or satellite imagery, they still suffer from poor robustness to domains where manual labeled data is unavailable, directly impacting post-disaster humanitarian assistance in such under-resourced geographies. Our contribution towards improving domain robustness in this scenario is two-fold. Firstly, we leverage the text-guided mask-based image editing capabilities of generative models and build an efficient and easily scalable pipeline to generate thousands of post-disaster images from low-resource domains. Secondly, we propose a simple two-stage training approach to train robust models while using manual supervision from different source domains along with the generated synthetic target domain data. We validate the strength of our proposed framework under cross-geography domain transfer setting from xBD and SKAI images in both single-source and multi-source settings, achieving significant improvements over a source-only baseline in each case.
Paper Structure (24 sections, 11 figures, 4 tables)

This paper contains 24 sections, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Summary of our proposed pipeline. A disaster assessment model trained using labeled data from a different domain suffers from poor accuracy due to significant distribution shifts with the target. We offer a novel way of addressing this limitation, by leveraging the recent advances in mask-based text-to-image models chang2023muse to generate thousands of synthetic labeled data from the target domain where only pre-disaster images are accessible. We incorporate this synthetic data along with source labeled data in a two-stage training framework to achieve significant gains on challening transfer settings from xBD gupta2019xbd and SKAI lee2020assessing datasets.
  • Figure 2: Overview of the proposed synthetic data generation pipeline. We first pass the pre-disaster image $U$ from the target domain through a pre-trained VQGAN encoder followed by a tokenizer to compute the latent token, which is then masked using a binary mask. We use the MUSE model along with a suitable text prompt $T$ to predict the output tokens from the masked tokens $u$, which are then de-tokenized to generate the post-disaster image $\hat{V}_t$. Our augmented dataset $\hat{D}_t$ now contains the input image $U_t$, generated image $\hat{V}_t$ and the binary label corresponding to the text prompt (indicating damage or no-damage).
  • Figure 3: Prompt Pool for SKAI
  • Figure 4: Prompt Pool for xBD
  • Figure 5: Ian Hurricane
  • ...and 6 more figures