Table of Contents
Fetching ...

Post-Disaster Affected Area Segmentation with a Vision Transformer (ViT)-based EVAP Model using Sentinel-2 and Formosat-5 Imagery

Yi-Shan Chu, Hsuan-Cheng Wei

TL;DR

This framework improves the smoothness and reliability of segmentation results, offering a scalable approach for disaster mapping when accurate ground truth is unavailable, and supports multiple decoder variants and multi-stage loss strategies to improve performance under limited supervision.

Abstract

We propose a vision transformer (ViT)-based deep learning framework to refine disaster-affected area segmentation from remote sensing imagery, aiming to support and enhance the Emergent Value Added Product (EVAP) developed by the Taiwan Space Agency (TASA). The process starts with a small set of manually annotated regions. We then apply principal component analysis (PCA)-based feature space analysis and construct a confidence index (CI) to expand these labels, producing a weakly supervised training set. These expanded labels are then used to train ViT-based encoder-decoder models with multi-band inputs from Sentinel-2 and Formosat-5 imagery. Our architecture supports multiple decoder variants and multi-stage loss strategies to improve performance under limited supervision. During the evaluation, model predictions are compared with higher-resolution EVAP output to assess spatial coherence and segmentation consistency. Case studies on the 2022 Poyang Lake drought and the 2023 Rhodes wildfire demonstrate that our framework improves the smoothness and reliability of segmentation results, offering a scalable approach for disaster mapping when accurate ground truth is unavailable.

Post-Disaster Affected Area Segmentation with a Vision Transformer (ViT)-based EVAP Model using Sentinel-2 and Formosat-5 Imagery

TL;DR

This framework improves the smoothness and reliability of segmentation results, offering a scalable approach for disaster mapping when accurate ground truth is unavailable, and supports multiple decoder variants and multi-stage loss strategies to improve performance under limited supervision.

Abstract

We propose a vision transformer (ViT)-based deep learning framework to refine disaster-affected area segmentation from remote sensing imagery, aiming to support and enhance the Emergent Value Added Product (EVAP) developed by the Taiwan Space Agency (TASA). The process starts with a small set of manually annotated regions. We then apply principal component analysis (PCA)-based feature space analysis and construct a confidence index (CI) to expand these labels, producing a weakly supervised training set. These expanded labels are then used to train ViT-based encoder-decoder models with multi-band inputs from Sentinel-2 and Formosat-5 imagery. Our architecture supports multiple decoder variants and multi-stage loss strategies to improve performance under limited supervision. During the evaluation, model predictions are compared with higher-resolution EVAP output to assess spatial coherence and segmentation consistency. Case studies on the 2022 Poyang Lake drought and the 2023 Rhodes wildfire demonstrate that our framework improves the smoothness and reliability of segmentation results, offering a scalable approach for disaster mapping when accurate ground truth is unavailable.

Paper Structure

This paper contains 26 sections, 11 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Schematic diagram illustrating the construction of the input tensor $X$ by concatenating pre-disaster ($I_{\mathrm{pre}}$) and post-disaster ($I_{\mathrm{post}}$) multi-band images along the channel dimension.
  • Figure 2: Comparison of model architectures used in this work. A: Vision Transformer (ViT) encoder with single-block decoder. B: ViT encoder with 4-layer CNN decoder. C: ViT encoder with U-Net style decoder.
  • Figure 3: Proposed system pipeline for disaster-affected area segmentation. The workflow consists of initial manual annotation, label expansion using Mahalanobis distance in the PCA feature space, followed by training of deep learning segmentation models.
  • Figure 4: Illustration of the label expansion pipeline. Manually labeled seed regions are projected into a reduced feature space via PCA. Pixels falling within a high-confidence region (as determined by Mahalanobis distance and user-specified confidence interval) are automatically assigned as expanded positive samples, producing an augmented label mask for weakly supervised learning.
  • Figure 5: Flowchart of the experimental pipeline. Pre/post RGBN (S2/FS5) are stacked and fed into a ViT encoder with three decoder variants: A (single conv), B (4-layer CNN), C (U-Net style). Models are trained under three losses: (1) BCE, (2) BCE--Dice, (3) two-stage BCE$\rightarrow$IoU. All metrics are computed against manually refined ground truth(GT).
  • ...and 5 more figures