Table of Contents
Fetching ...

Contextual Hourglass Network for Semantic Segmentation of High Resolution Aerial Imagery

Panfeng Li, Youzuo Lin, Emily Schultz-Fellenz

TL;DR

This work develops a novel semantic segmentation method which incorporates attention mechanism on processed low-resolution featuremaps to exploit the contextual semantics and exploits the stacked encoder-decoder structure by connecting multiple contextual hourglass modules from end to end.

Abstract

Semantic segmentation for aerial imagery is a challenging and important problem in remotely sensed imagery analysis. In recent years, with the success of deep learning, various convolutional neural network (CNN) based models have been developed. However, due to the varying sizes of the objects and imbalanced class labels, it can be challenging to obtain accurate pixel-wise semantic segmentation results. To address those challenges, we develop a novel semantic segmentation method and call it Contextual Hourglass Network. In our method, in order to improve the robustness of the prediction, we design a new contextual hourglass module which incorporates attention mechanism on processed low-resolution featuremaps to exploit the contextual semantics. We further exploit the stacked encoder-decoder structure by connecting multiple contextual hourglass modules from end to end. This architecture can effectively extract rich multi-scale features and add more feedback loops for better learning contextual semantics through intermediate supervision. To demonstrate the efficacy of our semantic segmentation method, we test it on Potsdam and Vaihingen datasets. Through the comparisons to other baseline methods, our method yields the best results on overall performance.

Contextual Hourglass Network for Semantic Segmentation of High Resolution Aerial Imagery

TL;DR

This work develops a novel semantic segmentation method which incorporates attention mechanism on processed low-resolution featuremaps to exploit the contextual semantics and exploits the stacked encoder-decoder structure by connecting multiple contextual hourglass modules from end to end.

Abstract

Semantic segmentation for aerial imagery is a challenging and important problem in remotely sensed imagery analysis. In recent years, with the success of deep learning, various convolutional neural network (CNN) based models have been developed. However, due to the varying sizes of the objects and imbalanced class labels, it can be challenging to obtain accurate pixel-wise semantic segmentation results. To address those challenges, we develop a novel semantic segmentation method and call it Contextual Hourglass Network. In our method, in order to improve the robustness of the prediction, we design a new contextual hourglass module which incorporates attention mechanism on processed low-resolution featuremaps to exploit the contextual semantics. We further exploit the stacked encoder-decoder structure by connecting multiple contextual hourglass modules from end to end. This architecture can effectively extract rich multi-scale features and add more feedback loops for better learning contextual semantics through intermediate supervision. To demonstrate the efficacy of our semantic segmentation method, we test it on Potsdam and Vaihingen datasets. Through the comparisons to other baseline methods, our method yields the best results on overall performance.

Paper Structure

This paper contains 9 sections, 1 equation, 2 figures, 2 tables.

Figures (2)

  • Figure 1: (a) Each white box in the figure corresponds to a residual block He_2016_CVPR. Blue circles are the intermediate predictions whereas the yellow one is the final prediction. A loss function is applied to all these predictions through the same ground truth. The region in the dashed orange box represents the encoding procedure, where red rhombus is the context layer and the pink box is the branch for semantic encoding loss. (b) The Encoding Layer contains a codebook and smoothing factors, capturing encoded semantics. The top branch predicts scaling factors selectively highlighting class-dependent featuremaps. The down branch predicts the presence of the categories in the scene. (Notation: FC fully connected layer, $\otimes$ channel-wise multiplication.)
  • Figure 2: Selected results on Potsdam and Vaihingen test set.