Table of Contents
Fetching ...

Hi-ResNet: Edge Detail Enhancement for High-Resolution Remote Sensing Segmentation

Yuxia Chen, Pengcheng Fang, Jianhui Yu, Xiaoling Zhong, Xiaoming Zhang, Tianrui Li

TL;DR

This work tackles semantic segmentation in high-resolution remote sensing by addressing scale/shape diversity and background clutter. It introduces Hi-ResNet, a CNN-based backbone that preserves high-resolution representations via a funnel module, a multi-branch IA module for multi-scale feature interaction, and a feature refinement module augmented with a class-agnostic edge aware loss. The model is further strengthened by a loss suite (Generalised Dice, Label Smoothing CE, and CEA) and leverages both supervised Mapillary and unsupervised MoCoV2 pre-training to boost performance on LoveDA, Potsdam, and Vaihingen. Empirical results show consistent improvements over state-of-the-art methods in mIoU and boundary accuracy, with competitive efficiency. The approach offers practical benefits for fine-grained, small-object segmentation in diverse geographies, making it suitable for real-world remote sensing tasks and large-scale mapping."

Abstract

High-resolution remote sensing (HRS) semantic segmentation extracts key objects from high-resolution coverage areas. However, objects of the same category within HRS images generally show significant differences in scale and shape across diverse geographical environments, making it difficult to fit the data distribution. Additionally, a complex background environment causes similar appearances of objects of different categories, which precipitates a substantial number of objects into misclassification as background. These issues make existing learning algorithms sub-optimal. In this work, we solve the above-mentioned problems by proposing a High-resolution remote sensing network (Hi-ResNet) with efficient network structure designs, which consists of a funnel module, a multi-branch module with stacks of information aggregation (IA) blocks, and a feature refinement module, sequentially, and Class-agnostic Edge Aware (CEA) loss. Specifically, we propose a funnel module to downsample, which reduces the computational cost, and extract high-resolution semantic information from the initial input image. Secondly, we downsample the processed feature images into multi-resolution branches incrementally to capture image features at different scales and apply IA blocks, which capture key latent information by leveraging attention mechanisms, for effective feature aggregation, distinguishing image features of the same class with variant scales and shapes. Finally, our feature refinement module integrate the CEA loss function, which disambiguates inter-class objects with similar shapes and increases the data distribution distance for correct predictions. With effective pre-training strategies, we demonstrated the superiority of Hi-ResNet over state-of-the-art methods on three HRS segmentation benchmarks.

Hi-ResNet: Edge Detail Enhancement for High-Resolution Remote Sensing Segmentation

TL;DR

This work tackles semantic segmentation in high-resolution remote sensing by addressing scale/shape diversity and background clutter. It introduces Hi-ResNet, a CNN-based backbone that preserves high-resolution representations via a funnel module, a multi-branch IA module for multi-scale feature interaction, and a feature refinement module augmented with a class-agnostic edge aware loss. The model is further strengthened by a loss suite (Generalised Dice, Label Smoothing CE, and CEA) and leverages both supervised Mapillary and unsupervised MoCoV2 pre-training to boost performance on LoveDA, Potsdam, and Vaihingen. Empirical results show consistent improvements over state-of-the-art methods in mIoU and boundary accuracy, with competitive efficiency. The approach offers practical benefits for fine-grained, small-object segmentation in diverse geographies, making it suitable for real-world remote sensing tasks and large-scale mapping."

Abstract

High-resolution remote sensing (HRS) semantic segmentation extracts key objects from high-resolution coverage areas. However, objects of the same category within HRS images generally show significant differences in scale and shape across diverse geographical environments, making it difficult to fit the data distribution. Additionally, a complex background environment causes similar appearances of objects of different categories, which precipitates a substantial number of objects into misclassification as background. These issues make existing learning algorithms sub-optimal. In this work, we solve the above-mentioned problems by proposing a High-resolution remote sensing network (Hi-ResNet) with efficient network structure designs, which consists of a funnel module, a multi-branch module with stacks of information aggregation (IA) blocks, and a feature refinement module, sequentially, and Class-agnostic Edge Aware (CEA) loss. Specifically, we propose a funnel module to downsample, which reduces the computational cost, and extract high-resolution semantic information from the initial input image. Secondly, we downsample the processed feature images into multi-resolution branches incrementally to capture image features at different scales and apply IA blocks, which capture key latent information by leveraging attention mechanisms, for effective feature aggregation, distinguishing image features of the same class with variant scales and shapes. Finally, our feature refinement module integrate the CEA loss function, which disambiguates inter-class objects with similar shapes and increases the data distribution distance for correct predictions. With effective pre-training strategies, we demonstrated the superiority of Hi-ResNet over state-of-the-art methods on three HRS segmentation benchmarks.
Paper Structure (39 sections, 11 equations, 11 figures, 14 tables)

This paper contains 39 sections, 11 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: Comparisons of our model behavior by heatmaps with different images illustrate the feature information obtained by upsampling and merging at the end of each layer for baseline and Hi-ResNet base. The three rows (a)(b)(c) show the original image, and the features of baseline and Hi-ResNet base separately. It is evident from the results that compared to the baseline, the Hi-ResNet base extracts richer and superior feature information.
  • Figure 2: The comprehensive architecture of Hi-ResNet is partitioned into four components. (a) The funnel module, composed of a downsample part and a funnel stem, is proposed for downsampling input imagery and facilitating feature extraction. (b) The multi-branch module further hones these features via the amalgamation of a multi-resolution convolutions stream. (c) In the feature refinement module, coarse features are computed directly via a convolution layer, with refined features managed through the utilization of OCR yuan2019segmentation. During inference, the coarse results and refined results are added in a 1:1 ratio as the model's output. (d) Multiple loss functions are employed, including LSCE loss muller2019does and GD losssudre2017generalised, which are computed in direct relation to the ground truth and predictions. Concurrently, the CEA randomly elects a category, designating all others as background, computing the loss between the two categories.
  • Figure 3: The structure of the funnel module where IB refers to inverted bottleneck. The number in each block refers to the kernel size and channel numbers respectively.
  • Figure 4: (a) the output features of the multi-branch module in Hi-ResNet before the extension. (b) the output features of the multi-branch module in Hi-ResNet after the extension.
  • Figure 5: This figure illustrates the process of feature information aggregation across various resolutions in the fusion layer of the network. Furthermore, we exchange the sequence of the BN and the conv here.
  • ...and 6 more figures