Table of Contents
Fetching ...

P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images

Tao Zhang, Shiqing Wei, Yikang Zhou, Muying Luo, Wenling You, Shunping Ji

TL;DR

This work tackles the problem of extracting regular building contours from remote sensing imagery, where irregular shapes, occlusions, and noise hinder traditional mask-, contour-, and vertex-based methods. It introduces P2PFormer, a primitive-to-polygon framework that first detects buildings, then segments generic primitives (e.g., vertices, lines, corners) within each bounding box and directly predicts their order to form regular contours, eliminating post-processing. The primitive segmenter employs group queries and a dynamic query position embedding to improve segmentation quality, while the lightweight order decoder directly regresses the primitive sequence. Across WHU, CrowdAI, and WHU-Mix datasets, P2PFormer achieves state-of-the-art results, notably surpassing previous methods by sizable margins, and demonstrates robustness to occlusion and diverse building styles, establishing a strong end-to-end baseline for regular building contour extraction in remote sensing imagery.

Abstract

Extracting building contours from remote sensing imagery is a significant challenge due to buildings' complex and diverse shapes, occlusions, and noise. Existing methods often struggle with irregular contours, rounded corners, and redundancy points, necessitating extensive post-processing to produce regular polygonal building contours. To address these challenges, we introduce a novel, streamlined pipeline that generates regular building contours without post-processing. Our approach begins with the segmentation of generic geometric primitives (which can include vertices, lines, and corners), followed by the prediction of their sequence. This allows for the direct construction of regular building contours by sequentially connecting the segmented primitives. Building on this pipeline, we developed P2PFormer, which utilizes a transformer-based architecture to segment geometric primitives and predict their order. To enhance the segmentation of primitives, we introduce a unique representation called group queries. This representation comprises a set of queries and a singular query position, which improve the focus on multiple midpoints of primitives and their efficient linkage. Furthermore, we propose an innovative implicit update strategy for the query position embedding aimed at sharpening the focus of queries on the correct positions and, consequently, enhancing the quality of primitive segmentation. Our experiments demonstrate that P2PFormer achieves new state-of-the-art performance on the WHU, CrowdAI, and WHU-Mix datasets, surpassing the previous SOTA PolyWorld by a margin of 2.7 AP and 6.5 AP75 on the largest CrowdAI dataset

P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images

TL;DR

This work tackles the problem of extracting regular building contours from remote sensing imagery, where irregular shapes, occlusions, and noise hinder traditional mask-, contour-, and vertex-based methods. It introduces P2PFormer, a primitive-to-polygon framework that first detects buildings, then segments generic primitives (e.g., vertices, lines, corners) within each bounding box and directly predicts their order to form regular contours, eliminating post-processing. The primitive segmenter employs group queries and a dynamic query position embedding to improve segmentation quality, while the lightweight order decoder directly regresses the primitive sequence. Across WHU, CrowdAI, and WHU-Mix datasets, P2PFormer achieves state-of-the-art results, notably surpassing previous methods by sizable margins, and demonstrates robustness to occlusion and diverse building styles, establishing a strong end-to-end baseline for regular building contour extraction in remote sensing imagery.

Abstract

Extracting building contours from remote sensing imagery is a significant challenge due to buildings' complex and diverse shapes, occlusions, and noise. Existing methods often struggle with irregular contours, rounded corners, and redundancy points, necessitating extensive post-processing to produce regular polygonal building contours. To address these challenges, we introduce a novel, streamlined pipeline that generates regular building contours without post-processing. Our approach begins with the segmentation of generic geometric primitives (which can include vertices, lines, and corners), followed by the prediction of their sequence. This allows for the direct construction of regular building contours by sequentially connecting the segmented primitives. Building on this pipeline, we developed P2PFormer, which utilizes a transformer-based architecture to segment geometric primitives and predict their order. To enhance the segmentation of primitives, we introduce a unique representation called group queries. This representation comprises a set of queries and a singular query position, which improve the focus on multiple midpoints of primitives and their efficient linkage. Furthermore, we propose an innovative implicit update strategy for the query position embedding aimed at sharpening the focus of queries on the correct positions and, consequently, enhancing the quality of primitive segmentation. Our experiments demonstrate that P2PFormer achieves new state-of-the-art performance on the WHU, CrowdAI, and WHU-Mix datasets, surpassing the previous SOTA PolyWorld by a margin of 2.7 AP and 6.5 AP75 on the largest CrowdAI dataset
Paper Structure (15 sections, 9 equations, 12 figures, 8 tables)

This paper contains 15 sections, 9 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Different pipelines for regular building contour extraction. From top to bottom: mask-based, contour-based, vertex-based, and our proposed primitive-based pipeline. Blue trapezoids represent networks, and green rectangles represent post-processing steps. Only the end-to-end primitive-based pipeline completely gets rid of the handcrafted post-processing.
  • Figure 2: Overall pipeline of P2PFormer. Initially, P2PFormer detects each building. Subsequently, it carries out primitive segmentation and primitive order regression, utilizing the building feature.
  • Figure 3: Architecture of the primitive segmenter. Circles and hexagons represent group queries, while triangles denote query position embeddings. The same color signifies that they belong to the same primitive.
  • Figure 4: The architecture of the primitive decoder block.
  • Figure 5: The structure of the primitive predictor (using line primitive as the example).
  • ...and 7 more figures