Enhancing DeepLabV3+ to Fuse Aerial and Satellite Images for Semantic Segmentation
Anas Berka, Mohamed El Hajji, Raphael Canals, Youssef Es-saady, Adel Hafiane
TL;DR
This work addresses land-cover semantic segmentation by fusing high-resolution aerial imagery with multispectral satellite data in a single DeepLabV3+ model. It introduces the DIFD architecture, featuring a novel UpConvT-based satellite upsampling path and a weighted upsampling decoder to inject satellite information into aerial features. The method achieves a mean IoU of 84.91% on LandCover.ai with Sentinel-2 data without augmentation, demonstrating improved performance on small and complex classes such as roads. The approach offers a practical pathway for leveraging complementary remote-sensing sources in a unified model, with potential for generalization to additional datasets and fusion modalities.
Abstract
Aerial and satellite imagery are inherently complementary remote sensing sources, offering high-resolution detail alongside expansive spatial coverage. However, the use of these sources for land cover segmentation introduces several challenges, prompting the development of a variety of segmentation methods. Among these approaches, the DeepLabV3+ architecture is considered as a promising approach in the field of single-source image segmentation. However, despite its reliable results for segmentation, there is still a need to increase its robustness and improve its performance. This is particularly crucial for multimodal image segmentation, where the fusion of diverse types of information is essential. An interesting approach involves enhancing this architectural framework through the integration of novel components and the modification of certain internal processes. In this paper, we enhance the DeepLabV3+ architecture by introducing a new transposed conventional layers block for upsampling a second entry to fuse it with high level features. This block is designed to amplify and integrate information from satellite images, thereby enriching the segmentation process through fusion with aerial images. For experiments, we used the LandCover.ai (Land Cover from Aerial Imagery) dataset for aerial images, alongside the corresponding dataset sourced from Sentinel 2 data. Through the fusion of both sources, the mean Intersection over Union (mIoU) achieved a total mIoU of 84.91% without data augmentation.
