Table of Contents
Fetching ...

Dual skip connections in U-Net, ResUnet and U-Net3+ for remote extraction of buildings

Bipul Neupane, Jagannath Aryal, Abbas Rajabifard

TL;DR

Three dual skip connection mechanisms for three networks (U-Net, ResUnet, and U-Net3+) are developed to selectively deepen the essential feature maps for improved performance and conclude that selectively densifying feature maps and skip connections enhances network performance without a substantial increase in parameters.

Abstract

Urban buildings are extracted from high-resolution Earth observation (EO) images using semantic segmentation networks like U-Net and its successors. Each re-iteration aims to improve performance by employing a denser skip connection mechanism that harnesses multi-scale features for accurate object mapping. However, denser connections increase network parameters and do not necessarily contribute to precise segmentation. In this paper, we develop three dual skip connection mechanisms for three networks (U-Net, ResUnet, and U-Net3+) to selectively deepen the essential feature maps for improved performance. The three mechanisms are evaluated on feature maps of different scales, producing nine new network configurations. They are evaluated against their original vanilla configurations on four building footprint datasets of different spatial resolutions, including a multi-resolution (0.3+0.6+1.2m) dataset that we develop for complex urban environments. The evaluation revealed that densifying the large- and small-scale features in U-Net and U-Net3+ produce up to 0.905 F1, more than TransUnet (0.903) and Swin-Unet (0.882) in our new dataset with up to 19x fewer parameters. The results conclude that selectively densifying feature maps and skip connections enhances network performance without a substantial increase in parameters. The findings and the new dataset will contribute to the computer vision domain and urban planning decision processes.

Dual skip connections in U-Net, ResUnet and U-Net3+ for remote extraction of buildings

TL;DR

Three dual skip connection mechanisms for three networks (U-Net, ResUnet, and U-Net3+) are developed to selectively deepen the essential feature maps for improved performance and conclude that selectively densifying feature maps and skip connections enhances network performance without a substantial increase in parameters.

Abstract

Urban buildings are extracted from high-resolution Earth observation (EO) images using semantic segmentation networks like U-Net and its successors. Each re-iteration aims to improve performance by employing a denser skip connection mechanism that harnesses multi-scale features for accurate object mapping. However, denser connections increase network parameters and do not necessarily contribute to precise segmentation. In this paper, we develop three dual skip connection mechanisms for three networks (U-Net, ResUnet, and U-Net3+) to selectively deepen the essential feature maps for improved performance. The three mechanisms are evaluated on feature maps of different scales, producing nine new network configurations. They are evaluated against their original vanilla configurations on four building footprint datasets of different spatial resolutions, including a multi-resolution (0.3+0.6+1.2m) dataset that we develop for complex urban environments. The evaluation revealed that densifying the large- and small-scale features in U-Net and U-Net3+ produce up to 0.905 F1, more than TransUnet (0.903) and Swin-Unet (0.882) in our new dataset with up to 19x fewer parameters. The results conclude that selectively densifying feature maps and skip connections enhances network performance without a substantial increase in parameters. The findings and the new dataset will contribute to the computer vision domain and urban planning decision processes.
Paper Structure (22 sections, 14 equations, 8 figures, 6 tables)

This paper contains 22 sections, 14 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Illustration of the plain skip connections in U-Net and the proposed DSCM.
  • Figure 2: Illustration of DS-ResUNet-A along with the differences between the plain skip connections in ResUnet and the proposed DRSCM
  • Figure 3: Illustration of the proposed DS-UNet3+ networks DS-UNet3+(L), DS-UNet3+(S), and DS-UNet3+(A).
  • Figure 4: Illustration of dual skip feature aggregation mechanism (abbr. DSFAM) at the third decoder layer $X_{De}^{3}$ of DS-UNet3+ (figure adopted and modified from huang2020unet).
  • Figure 5: Segmentation output from the proposed networks and their original vanilla baseline networks on the MELB building dataset.
  • ...and 3 more figures