Table of Contents
Fetching ...

PolyR-CNN: R-CNN for end-to-end polygonal building outline extraction

Weiqin Jiao, Claudio Persello, George Vosselman

TL;DR

PolyR-CNN addresses the challenge of extracting polygonal building outlines by proposing a simple, end-to-end R-CNN that directly predicts ordered polygon vertices from RoI features. It introduces a vertex proposal feature to inject geometric detail into RoI representations and employs set-based Hungarian matching with joint bbox/polygon losses for training. The method achieves competitive accuracy on the CrowdAI dataset and demonstrates the capability to handle buildings with holes on the Inria dataset, while significantly improving inference speed and reducing model complexity compared with prior end-to-end approaches. This framework offers practical benefits for large-scale vector building datasets, enabling faster, scalable polygon extraction without heavy multi-stage architectures or reliance on segmentation masks.

Abstract

Polygonal building outline extraction has been a research focus in recent years. Most existing methods have addressed this challenging task by decomposing it into several subtasks and employing carefully designed architectures. Despite their accuracy, such pipelines often introduce inefficiencies during training and inference. This paper presents an end-to-end framework, denoted as PolyR-CNN, which offers an efficient and fully integrated approach to predict vectorized building polygons and bounding boxes directly from remotely sensed images. Notably, PolyR-CNN leverages solely the features of the Region of Interest (RoI) for the prediction, thereby mitigating the necessity for complex designs. Furthermore, we propose a novel scheme with PolyR-CNN to extract detailed outline information from polygon vertex coordinates, termed vertex proposal feature, to guide the RoI features to predict more regular buildings. PolyR-CNN demonstrates the capacity to deal with buildings with holes through a simple post-processing method on the Inria dataset. Comprehensive experiments conducted on the CrowdAI dataset show that PolyR-CNN achieves competitive accuracy compared to state-of-the-art methods while significantly improving computational efficiency, i.e., achieving 79.2 Average Precision (AP), exhibiting a 15.9 AP gain and operating 2.5 times faster and four times lighter than the well-established end-to-end method PolyWorld. Replacing the backbone with a simple ResNet-50, PolyR-CNN maintains a 71.1 AP while running four times faster than PolyWorld.

PolyR-CNN: R-CNN for end-to-end polygonal building outline extraction

TL;DR

PolyR-CNN addresses the challenge of extracting polygonal building outlines by proposing a simple, end-to-end R-CNN that directly predicts ordered polygon vertices from RoI features. It introduces a vertex proposal feature to inject geometric detail into RoI representations and employs set-based Hungarian matching with joint bbox/polygon losses for training. The method achieves competitive accuracy on the CrowdAI dataset and demonstrates the capability to handle buildings with holes on the Inria dataset, while significantly improving inference speed and reducing model complexity compared with prior end-to-end approaches. This framework offers practical benefits for large-scale vector building datasets, enabling faster, scalable polygon extraction without heavy multi-stage architectures or reliance on segmentation masks.

Abstract

Polygonal building outline extraction has been a research focus in recent years. Most existing methods have addressed this challenging task by decomposing it into several subtasks and employing carefully designed architectures. Despite their accuracy, such pipelines often introduce inefficiencies during training and inference. This paper presents an end-to-end framework, denoted as PolyR-CNN, which offers an efficient and fully integrated approach to predict vectorized building polygons and bounding boxes directly from remotely sensed images. Notably, PolyR-CNN leverages solely the features of the Region of Interest (RoI) for the prediction, thereby mitigating the necessity for complex designs. Furthermore, we propose a novel scheme with PolyR-CNN to extract detailed outline information from polygon vertex coordinates, termed vertex proposal feature, to guide the RoI features to predict more regular buildings. PolyR-CNN demonstrates the capacity to deal with buildings with holes through a simple post-processing method on the Inria dataset. Comprehensive experiments conducted on the CrowdAI dataset show that PolyR-CNN achieves competitive accuracy compared to state-of-the-art methods while significantly improving computational efficiency, i.e., achieving 79.2 Average Precision (AP), exhibiting a 15.9 AP gain and operating 2.5 times faster and four times lighter than the well-established end-to-end method PolyWorld. Replacing the backbone with a simple ResNet-50, PolyR-CNN maintains a 71.1 AP while running four times faster than PolyWorld.
Paper Structure (18 sections, 4 equations, 6 figures, 6 tables)

This paper contains 18 sections, 4 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The two pipelines of existing polygonal building outline extraction methods.
  • Figure 2: The overall architecture of PolyR-CNN. The backbone provides multi-scale feature maps. The proposal polygons and proposal bounding boxes are initialized and iteratively refined through 6 consecutive layers. Each layer contains four modules: RoI feature extraction (light yellow), vertex proposal feature extraction (light blue), RoI feature guidance (light orange) and task-specific prediction heads (light green). The dimensions of each variable are also shown in the figure, where $BS$ denotes batch size, $N$ represents the number of proposals per image, $M$ is the unified number of vertices per polygon and $d$ is the feature dimension. The final building polygon is generated by filtering out redundant vertices based on the vertex classification scores.
  • Figure 3: Qualitative comparison of building extraction results on the CrowdAI test dataset among ground truth, FFL girard2021polygonal, HiSup xu2022accurate and PolyR-CNN (from left to right).
  • Figure 4: Qualitative comparison on the Inria test dataset between HiSup xu2022accurate and PolyR-CNN.
  • Figure 5: Examples of predicted building outlines with internal holes of the Inria test images.
  • ...and 1 more figures