Learning Efficient Unsupervised Satellite Image-based Building Damage Detection

Yiyun Zhang; Zijian Wang; Yadan Luo; Xin Yu; Zi Huang

Learning Efficient Unsupervised Satellite Image-based Building Damage Detection

Yiyun Zhang, Zijian Wang, Yadan Luo, Xin Yu, Zi Huang

TL;DR

A novel self-supervised framework, U-BDD++, is presented, which improves upon the U-BDD baseline by addressing domain-specific issues associated with satellite imagery.

Abstract

Existing Building Damage Detection (BDD) methods always require labour-intensive pixel-level annotations of buildings and their conditions, hence largely limiting their applications. In this paper, we investigate a challenging yet practical scenario of BDD, Unsupervised Building Damage Detection (U-BDD), where only unlabelled pre- and post-disaster satellite image pairs are provided. As a pilot study, we have first proposed an advanced U-BDD baseline that leverages pre-trained vision-language foundation models (i.e., Grounding DINO, SAM and CLIP) to address the U-BDD task. However, the apparent domain gap between satellite and generic images causes low confidence in the foundation models used to identify buildings and their damages. In response, we further present a novel self-supervised framework, U-BDD++, which improves upon the U-BDD baseline by addressing domain-specific issues associated with satellite imagery. Furthermore, the new Building Proposal Generation (BPG) module and the CLIP-enabled noisy Building Proposal Selection (CLIP-BPS) module in U-BDD++ ensure high-quality self-training. Extensive experiments on the widely used building damage assessment benchmark demonstrate the effectiveness of the proposed method for unsupervised building damage detection. The presented annotation-free and foundation model-based paradigm ensures an efficient learning phase. This study opens a new direction for real-world BDD and sets a strong baseline for future research.

Learning Efficient Unsupervised Satellite Image-based Building Damage Detection

TL;DR

A novel self-supervised framework, U-BDD++, is presented, which improves upon the U-BDD baseline by addressing domain-specific issues associated with satellite imagery.

Abstract

Paper Structure (17 sections, 11 equations, 6 figures, 5 tables)

This paper contains 17 sections, 11 equations, 6 figures, 5 tables.

Introduction
Related Work
U-BDD: A Baseline Approach
Overview
Building Localisation
Damage Classification
U-BDD++: An Improved Approach
Overview
BPG Module
CLIP-BPS Module
Self-training Module
Experiment
Dataset
Building Localisation
Damage Classification
...and 2 more sections

Figures (6)

Figure 1: High-level illustration of a U-BDD approach. Provided with a pre-disaster satellite image (top-left) and the corresponding post-disaster image (top-right) in an area of interest, pre-trained foundation models can be applied to perform building localisation (bottom-left) and damage classification (bottom-right) masks.
Figure 2: The schema of the U-BDD baseline. The pre-disaster images are processed through Grounding DINO and SAM, yielding a building segmentation mask. Concurrently, pre- and post-disaster image pairs, along with the bounding boxes predicted from Grounding DINO, are passed to the CLIP model for damage assessment per building. The final evaluation mask is obtained by integrating the building segmentation mask with the damage predictions.
Figure 3: The proposed workflow of U-BDD++. U-BDD++ extends the U-BDD baseline to include fine-tuning of the foundation models in both stages. To address the domain shift issue in the satellite imagery, two specifically designed modules, BPG and CLIP-BPS, provide the model in the building localisation stage with high-quality initial fine-tuning supervision.
Figure 4: Example visual demonstration of CLIP-BPS module. Multiscale merging and preliminary filters remove duplicate or incorrectly sized bounding boxes from the BPG module, while preserving most buildings. Both merging and filtering can be processed simultaneously. For visualisation clarity, multiscale merge is processed first before preliminary filters. Finally, the CLIP filter removes false positives and large incorrect predictions with similar semantic traits, such as tennis courts.
Figure 5: Visual predictions from U-BDD baseline and U-BDD++ on end-to-end building localisation and damage classification. Each column represents the pre-disaster images, the post-disaster images, the baseline evaluation mask, the U-BDD++ evaluation mask and the ground truth evaluation mask respectively. Baseline and U-BDD++ classify buildings as undamaged (green) and damaged (orange). Ground truth masks further classify buildings as no damage (green), minor damage (yellow), major damage (orange) and destroyed (red). Note that the different damage levels in the ground truth masks are for reference only, and all damage levels except no damage will be considered damaged in U-BDD. Both pre-trained Grounding DINO in baselines and the fine-tuned DINO in U-BDD++ have a prediction box threshold $\sigma_{G}$ of 0.15.
...and 1 more figures

Learning Efficient Unsupervised Satellite Image-based Building Damage Detection

TL;DR

Abstract

Learning Efficient Unsupervised Satellite Image-based Building Damage Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)