Table of Contents
Fetching ...

RoIPoly: Vectorized Building Outline Extraction Using Vertex and Logit Embeddings

Weiqin Jiao, Hao Cheng, Claudio Persello, George Vosselman

TL;DR

RoIPoly tackles vectorized building outline extraction from aerial imagery by addressing vertex redundancy and computational cost through RoI-constrained vertex queries and a learnable logit embedding fused with adaLN. The method predicts a fixed-size vertex sequence per polygon and uses a specialized ordering-based matching, enabling end-to-end training without post-processing. It achieves state-of-the-art-like performance on CrowdAI, particularly for small buildings, and competitive results on Structured3D, demonstrating strong cross-domain generalization. The approach significantly reduces attention scope to per-building RoIs, lowering FLOPs while maintaining or improving polygon quality.

Abstract

Polygonal building outlines are crucial for geographic and cartographic applications. The existing approaches for outline extraction from aerial or satellite imagery are typically decomposed into subtasks, e.g., building masking and vectorization, or treat this task as a sequence-to-sequence prediction of ordered vertices. The former lacks efficiency, and the latter often generates redundant vertices, both resulting in suboptimal performance. To handle these issues, we propose a novel Region-of-Interest (RoI) query-based approach called RoIPoly. Specifically, we formulate each vertex as a query and constrain the query attention on the most relevant regions of a potential building, yielding reduced computational overhead and more efficient vertex level interaction. Moreover, we introduce a novel learnable logit embedding to facilitate vertex classification on the attention map; thus, no post-processing is needed for redundant vertex removal. We evaluated our method on the vectorized building outline extraction dataset CrowdAI and the 2D floorplan reconstruction dataset Structured3D. On the CrowdAI dataset, RoIPoly with a ResNet50 backbone outperforms existing methods with the same or better backbones on most MS-COCO metrics, especially on small buildings, and achieves competitive results in polygon quality and vertex redundancy without any post-processing. On the Structured3D dataset, our method achieves the second-best performance on most metrics among existing methods dedicated to 2D floorplan reconstruction, demonstrating our cross-domain generalization capability. The code will be released upon acceptance of this paper.

RoIPoly: Vectorized Building Outline Extraction Using Vertex and Logit Embeddings

TL;DR

RoIPoly tackles vectorized building outline extraction from aerial imagery by addressing vertex redundancy and computational cost through RoI-constrained vertex queries and a learnable logit embedding fused with adaLN. The method predicts a fixed-size vertex sequence per polygon and uses a specialized ordering-based matching, enabling end-to-end training without post-processing. It achieves state-of-the-art-like performance on CrowdAI, particularly for small buildings, and competitive results on Structured3D, demonstrating strong cross-domain generalization. The approach significantly reduces attention scope to per-building RoIs, lowering FLOPs while maintaining or improving polygon quality.

Abstract

Polygonal building outlines are crucial for geographic and cartographic applications. The existing approaches for outline extraction from aerial or satellite imagery are typically decomposed into subtasks, e.g., building masking and vectorization, or treat this task as a sequence-to-sequence prediction of ordered vertices. The former lacks efficiency, and the latter often generates redundant vertices, both resulting in suboptimal performance. To handle these issues, we propose a novel Region-of-Interest (RoI) query-based approach called RoIPoly. Specifically, we formulate each vertex as a query and constrain the query attention on the most relevant regions of a potential building, yielding reduced computational overhead and more efficient vertex level interaction. Moreover, we introduce a novel learnable logit embedding to facilitate vertex classification on the attention map; thus, no post-processing is needed for redundant vertex removal. We evaluated our method on the vectorized building outline extraction dataset CrowdAI and the 2D floorplan reconstruction dataset Structured3D. On the CrowdAI dataset, RoIPoly with a ResNet50 backbone outperforms existing methods with the same or better backbones on most MS-COCO metrics, especially on small buildings, and achieves competitive results in polygon quality and vertex redundancy without any post-processing. On the Structured3D dataset, our method achieves the second-best performance on most metrics among existing methods dedicated to 2D floorplan reconstruction, demonstrating our cross-domain generalization capability. The code will be released upon acceptance of this paper.
Paper Structure (14 sections, 13 equations, 7 figures, 7 tables)

This paper contains 14 sections, 13 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Examples of the Region-of-Interest (RoI) features of different buildings. A warmer color depicts a higher value in the RoI feature maps, indicating the visual cues for the buildings (highlighted in white polylines) to be outlined. Our RoI query-based approach ensures vertex query attention is confined to building RoIs so as to exclude irrelevant background features and improve attention efficiency. It should be noted that the polylines in this figure only serve as the visual reference to show the alignment between the buildings and feature maps; they are not included in the feature maps for the model training.
  • Figure 2: Distance-based matching in PolyBuilding hu2023polybuilding (first row) vs. our ordering-based matching (second row) for vertex sampling in each polygon. The sampled vertices are marked in blue while the ground truth vertices are marked in yellow.
  • Figure 3: The overall architecture of our proposed method RoIPoly. An image containing multiple buildings is fed into a backbone to extract multi-scale feature maps and an object detector to generate proposal Building Bounding Boxes (BBBs). The RoIAlgin module extracts the RoI features for each building instance located by a corresponding BBB, which are later fed into an encoder-decoder pipeline with vertex and logit queries. In each such pipeline, a vertex coordinate regression head and a vertex classification head are utilized to provide the final prediction simultaneously. The variables denoted in this figure: $\text{BS}$ is the batch size, $R$ is the number of ground truth polygons of an image, $N$ is the number of proposal polygons per image, $M$ is the number of proposal vertices per polygon, $C$ is the feature dimension, and $H \times W$ is the RoI resolution.
  • Figure 4: Comparison of the decoder in PolyBuilding hu2023polybuilding and RoIPoly. For clarity, we only show one layer of the decoder, omitting the sigmoid and feed-forward modules.
  • Figure 5: Examples of vectorized building polygon extraction on the CrowdAI dataset. Top row: Ground truth. Middle row: HiSup xu2023hisup. Bottom row: RoIPoly (ours). The white boxes highlight the areas that include intricate artifacts of edges and redundant vertices, and the yellow boxes highlight the areas of mixing vertices from different buildings.
  • ...and 2 more figures