Prompt-Driven Building Footprint Extraction in Aerial Images with Offset-Building Model

Kai Li; Yupeng Deng; Yunlong Kong; Diyou Liu; Jingbo Chen; Yu Meng; Junxian Ma; Chenhao Wang

Prompt-Driven Building Footprint Extraction in Aerial Images with Offset-Building Model

Kai Li, Yupeng Deng, Yunlong Kong, Diyou Liu, Jingbo Chen, Yu Meng, Junxian Ma, Chenhao Wang

TL;DR

This paper tackles building footprint extraction from very-high-resolution aerial imagery by moving to a prompt-based paradigm. It introduces the Offset-Building Model (OBM), which extends Segment Anything Model (SAM) with a Reference Offset Augment Module (ROAM) and a Distance-NMS framework to predict roof segmentation and precise roof-to-footprint offsets. The authors propose a comprehensive prompt-based evaluation framework and demonstrate that OBM achieves superior roof IoU and offset direction accuracy, with notable generalization to new datasets like Huizhou and OmniCity. The work also presents a new Huizhou test set for robust cross-domain validation and shows that prompt-level metrics can better reflect footprint quality in production settings. Overall, OBM, DNMS, and ROAM collectively enable accurate, scalable footprint extraction with reduced human intervention and improved generalization.

Abstract

More accurate extraction of invisible building footprints from very-high-resolution (VHR) aerial images relies on roof segmentation and roof-to-footprint offset extraction. Existing methods based on instance segmentation suffer from poor generalization when extended to large-scale data production and fail to achieve low-cost human interaction. This prompt paradigm inspires us to design a promptable framework for roof and offset extraction, and transforms end-to-end algorithms into promptable methods. Within this framework, we propose a novel Offset-Building Model (OBM). Based on prompt prediction, we first discover a common pattern of predicting offsets and tailored Distance-NMS (DNMS) algorithms for offset optimization. To rigorously evaluate the algorithm's capabilities, we introduce a prompt-based evaluation method, where our model reduces offset errors by 16.6\% and improves roof Intersection over Union (IoU) by 10.8\% compared to other models. Leveraging the common patterns in predicting offsets, DNMS algorithms enable models to further reduce offset vector loss by 6.5\%. To further validate the generalization of models, we tested them using a newly proposed test set, Huizhou test set, with over 7,000 manually annotated instance samples. Our algorithms and dataset will be available at https://github.com/likaiucas/OBM.

Prompt-Driven Building Footprint Extraction in Aerial Images with Offset-Building Model

TL;DR

Abstract

Paper Structure (30 sections, 8 equations, 10 figures, 13 tables, 1 algorithm)

This paper contains 30 sections, 8 equations, 10 figures, 13 tables, 1 algorithm.

Introduction
Related Work
Methodology
Problem formulation
Offset Building Model (OBM)
Prompt Sampler
Offset Token and Offset Coder
Reference Offset Augment Module (ROAM)
Distance NMS and soft Distance NMS
ROI Prompt Based Offset Extraction
Relative Height Map
Metric method
Offset metrics
Mask metrics
Experiment
...and 15 more sections

Figures (10)

Figure 1: In given pictures, red boundaries and green boundaries represent roof and footprint respectively. During large-scale data production, instance segmentation methods face challenges related to generalization. The listed results are from LOFTa12, and the input images are from real production process which is 100% unseen by LOFT. Apart from mistake recognition, problems manifest in two aspects. Firstly, these methods usually rely on post-processing algorithms. Showing in first picture, a strict NMS algorithm lost many instances. To address this, soft NMSa14 often applied to minimize the number of missing samples. However, lower score thresholds of soft NMS consequently matched one building with many instances in pieces. The confusing results let data producer hard to choose correct instances. Of course, predicted buildings in neighbor can be merged and fused together as shown in the third picture. However, the results let densely populated buildings stick together, far from getting wanted results as listed ground truth. Secondly, data producers have to plot out those missing samples point-by-point, because of inflexible Region Proposal Network (RPN).
Figure 2: With provided prompts, our model can extract roof and footprint for buildings and generate a relative height map.
Figure 3: In (a), the OBM extends the SAM model by adding an offset prediction branch, namely ROAM. To adapt to diverse GPU capacities, we implemented an optional Prompt Sampler for prompt selection. The Offset Tokens, along with Prompt Tokens and Mask Tokens are fed into the Decoder. Additionally, an Offset Coder similar to DETR's Box Coder enhances offset training. As shown in (b), ROAM are used for offset prediction, which are composed by a Base Head and several Adaptive Head. Base Head will firstly generate a reference offset and an indicator vector. Indicator vectors from the Base Head select offsets and then roam offsets to different Adaptive Offset Heads. The ultimate output offsets are derived from both base head and adaptive heads.
Figure 4: During the training of models, RPN (Region Proposal Network) is utilized to produce boxes. These boxes are employed with ROI Align to crop features, which is subsequently transformed into local features of identical size. These features are then inputted into multitask heads for regression. In the inference phase, the RPN is deactivated, and models use manually provided boxes for ROI extraction.
Figure 5: Relative height map will be generated by fading the top roof to its footprint.
...and 5 more figures

Prompt-Driven Building Footprint Extraction in Aerial Images with Offset-Building Model

TL;DR

Abstract

Prompt-Driven Building Footprint Extraction in Aerial Images with Offset-Building Model

Authors

TL;DR

Abstract

Table of Contents

Figures (10)