Table of Contents
Fetching ...

Damage Assessment after Natural Disasters with UAVs: Semantic Feature Extraction using Deep Learning

Nethmi S. Hewawiththi, M. Mahesha Viduranga, Vanodhya G. Warnasooriya, Tharindu Fernando, Himal A. Suraweera, Sridha Sridharan, Clinton Fookes

TL;DR

The paper tackles bandwidth limitations in UAV-based disaster response by introducing an onboard, learnable semantic extractor that selects task-relevant information for transmission. It couples a PSPNet-based semantic segmentation module with a differentiable binary masking predictor, jointly trained with two downstream tasks: Visual Question Answering and damage-extent classification, enabling substantial data reduction without compromising accuracy. Evaluations on FloodNet and RescueNet show data transmission reductions of approximately 86–92% with minimal impact on downstream performance, demonstrating strong potential for real-time, bandwidth-constrained disaster response. The approach is task-agnostic, lightweight, and suitable for onboard deployment, offering significant practical impact for faster, more reliable emergency decision-making.

Abstract

Unmanned aerial vehicle-assisted disaster recovery missions have been promoted recently due to their reliability and flexibility. Machine learning algorithms running onboard significantly enhance the utility of UAVs by enabling real-time data processing and efficient decision-making, despite being in a resource-constrained environment. However, the limited bandwidth and intermittent connectivity make transmitting the outputs to ground stations challenging. This paper proposes a novel semantic extractor that can be adopted into any machine learning downstream task for identifying the critical data required for decision-making. The semantic extractor can be executed onboard which results in a reduction of data that needs to be transmitted to ground stations. We test the proposed architecture together with the semantic extractor on two publicly available datasets, FloodNet and RescueNet, for two downstream tasks: visual question answering and disaster damage level classification. Our experimental results demonstrate the proposed method maintains high accuracy across different downstream tasks while significantly reducing the volume of transmitted data, highlighting the effectiveness of our semantic extractor in capturing task-specific salient information.

Damage Assessment after Natural Disasters with UAVs: Semantic Feature Extraction using Deep Learning

TL;DR

The paper tackles bandwidth limitations in UAV-based disaster response by introducing an onboard, learnable semantic extractor that selects task-relevant information for transmission. It couples a PSPNet-based semantic segmentation module with a differentiable binary masking predictor, jointly trained with two downstream tasks: Visual Question Answering and damage-extent classification, enabling substantial data reduction without compromising accuracy. Evaluations on FloodNet and RescueNet show data transmission reductions of approximately 86–92% with minimal impact on downstream performance, demonstrating strong potential for real-time, bandwidth-constrained disaster response. The approach is task-agnostic, lightweight, and suitable for onboard deployment, offering significant practical impact for faster, more reliable emergency decision-making.

Abstract

Unmanned aerial vehicle-assisted disaster recovery missions have been promoted recently due to their reliability and flexibility. Machine learning algorithms running onboard significantly enhance the utility of UAVs by enabling real-time data processing and efficient decision-making, despite being in a resource-constrained environment. However, the limited bandwidth and intermittent connectivity make transmitting the outputs to ground stations challenging. This paper proposes a novel semantic extractor that can be adopted into any machine learning downstream task for identifying the critical data required for decision-making. The semantic extractor can be executed onboard which results in a reduction of data that needs to be transmitted to ground stations. We test the proposed architecture together with the semantic extractor on two publicly available datasets, FloodNet and RescueNet, for two downstream tasks: visual question answering and disaster damage level classification. Our experimental results demonstrate the proposed method maintains high accuracy across different downstream tasks while significantly reducing the volume of transmitted data, highlighting the effectiveness of our semantic extractor in capturing task-specific salient information.

Paper Structure

This paper contains 16 sections, 17 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: An overview of the architecture of the proposed framework. First, the emergency area is captured using a UAV and it is converted into a semantic segmentation mask using PSPNet PSPNetzhao2017pyramid. Next, a binary mask is created for the semantic segmentation mask using an FCN-based mask predictor to filter the critical data for the downstream task. Finally, the semantic mask is multiplied with the binary mask to filter unnecessary data making it ready to be transmitted to the ground station.
  • Figure 2: Detailed structure of pyramid pooling module. First, the features are extracted at four different spatial scales through pyramid pooling. Here, the scales are 1 × 1, 2 × 2, 3 × 3, and 6 × 6. Then, 1 × 1 convolution is applied enhancing the nonlinear learning ability of the multiscale features. Next, a bilinear interpolation method is used to further interpolate convoluted feature maps. Finally, they are concatenated with the four upsampled feature maps.
  • Figure 3: The architecture of the data masking model. Initially, the segmented mask is sent through three transposed convolutional layers followed by a ReLU activation with an upsampling factor of two. Next, the output of the third transposed convolutional layer is sent via a convolution layer which is capable of creating a single channel output. Then, it is mapped into a binary mask between the values 0 and 1 through the Sigmoid function and finally resized to output a binary mask of the same size as the input image.
  • Figure 4: The architecture of the visual question answering (VQA) model used in our workkane_khose_2022. First, the masked segmentation mask and question features are created using visual and text encoders, respectively. Next, two features are fused using a combination mechanism and finally, an answer classifier is used to output the appropriate answer.
  • Figure 5: The architecture of the proposed classifier model. First, the image transmitted to the ground station is sent through a convolutional layer followed by batch normalization and ReLU activation. Next, the data is consecutively passed through four main layers of the ResNet-50 backbone and an adaptive average pooling is applied to reduce the spatial dimensions of the feature maps into 1*1. Finally, the output is flattened and a fully connected layer is used to prepare the final output of the model.
  • ...and 3 more figures