Table of Contents
Fetching ...

Context-Enhanced Detector For Building Detection From Remote Sensing Images

Ziyue Huang, Mingming Zhang, Qingjie Liu, Wei Wang, Zhe Dong, Yunhong Wang

TL;DR

This paper tackles the challenge of accurate building detection in diverse remote sensing scenes by introducing Context-Enhanced Detector (CEDet), a three-stage cascade that explicitly models contextual information. It combines a Semantic Guided Contextual Mining (SGCM) module for multi-scale semantic fusion with a self-attention mechanism and a pseudo-masks segmentation loss, and an Instance Context Mining Module (ICMM) to capture instance-level spatial relationships via a relational graph. The CE Head decouples classification and regression and integrates ICMM within a cascade framework, leading to state-of-the-art performance on CNBuilding-9P, CNBuilding-23P, and SpaceNet, with notable gains in AP50 and AP75. Overall, the approach demonstrates that incorporating both global and instance-level contextual cues significantly enhances building detection in complex urban and suburban scenes, validating the practical value of context-aware detection in remote sensing applications.

Abstract

The field of building detection from remote sensing images has made significant progress, but faces challenges in achieving high-accuracy detection due to the diversity in building appearances and the complexity of vast scenes. To address these challenges, we propose a novel approach called Context-Enhanced Detector (CEDet). Our approach utilizes a three-stage cascade structure to enhance the extraction of contextual information and improve building detection accuracy. Specifically, we introduce two modules: the Semantic Guided Contextual Mining (SGCM) module, which aggregates multi-scale contexts and incorporates an attention mechanism to capture long-range interactions, and the Instance Context Mining Module (ICMM), which captures instance-level relationship context by constructing a spatial relationship graph and aggregating instance features. Additionally, we introduce a semantic segmentation loss based on pseudo-masks to guide contextual information extraction. Our method achieves state-of-the-art performance on three building detection benchmarks, including CNBuilding-9P, CNBuilding-23P, and SpaceNet.

Context-Enhanced Detector For Building Detection From Remote Sensing Images

TL;DR

This paper tackles the challenge of accurate building detection in diverse remote sensing scenes by introducing Context-Enhanced Detector (CEDet), a three-stage cascade that explicitly models contextual information. It combines a Semantic Guided Contextual Mining (SGCM) module for multi-scale semantic fusion with a self-attention mechanism and a pseudo-masks segmentation loss, and an Instance Context Mining Module (ICMM) to capture instance-level spatial relationships via a relational graph. The CE Head decouples classification and regression and integrates ICMM within a cascade framework, leading to state-of-the-art performance on CNBuilding-9P, CNBuilding-23P, and SpaceNet, with notable gains in AP50 and AP75. Overall, the approach demonstrates that incorporating both global and instance-level contextual cues significantly enhances building detection in complex urban and suburban scenes, validating the practical value of context-aware detection in remote sensing applications.

Abstract

The field of building detection from remote sensing images has made significant progress, but faces challenges in achieving high-accuracy detection due to the diversity in building appearances and the complexity of vast scenes. To address these challenges, we propose a novel approach called Context-Enhanced Detector (CEDet). Our approach utilizes a three-stage cascade structure to enhance the extraction of contextual information and improve building detection accuracy. Specifically, we introduce two modules: the Semantic Guided Contextual Mining (SGCM) module, which aggregates multi-scale contexts and incorporates an attention mechanism to capture long-range interactions, and the Instance Context Mining Module (ICMM), which captures instance-level relationship context by constructing a spatial relationship graph and aggregating instance features. Additionally, we introduce a semantic segmentation loss based on pseudo-masks to guide contextual information extraction. Our method achieves state-of-the-art performance on three building detection benchmarks, including CNBuilding-9P, CNBuilding-23P, and SpaceNet.
Paper Structure (16 sections, 8 equations, 7 figures, 10 tables)

This paper contains 16 sections, 8 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Context-Enhanced Detector (CEDet) is a three-stage detection model for high-accuracy building detection. Semantic Guided Context Mining (SGCM) module can enhance the multi-scale feature context. Context Enhancement OR-CNN Head (CE Head) adopts the decoupling structure and obtains the relationship contextual information by Instance Context Mining Module (ICMM).
  • Figure 2: Semantic Guided Context Mining (SGCM) module has two phrases: (a): Fusion, which enhances contextual information, performs feature fusion and uses semantic loss for supervision. (b): Reduction, which enhances FPN features with the fused feature.
  • Figure 3: Instance Context Mining Module (ICMM) uses spatial relationships between RoIs to extract instance-level contextual features.
  • Figure 4: Some examples in the CNBuilding-9P. CNBuilding-9P dataset covers nine provinces in China and contains 50,782 images with 1,210,968 building instances. The first row is the original image, and the orange boxes in the second row represent the ground truth. Zoom in to see more details.
  • Figure 5: The statistic of CNBuilding-9P and CNBuilding-23P, including the square root of building area and the number of buildings per images.
  • ...and 2 more figures