Table of Contents
Fetching ...

PolyFootNet: Extracting Polygonal Building Footprints in Off-Nadir Remote Sensing Images

Kai Li, Yupeng Deng, Jingbo Chen, Yu Meng, Zhihao Xi, Junxian Ma, Chenhao Wang, Maolin Wang, Xiangyu Zhao

TL;DR

PolyFootNet tackles polygonal building footprint extraction in challenging off-nadir imagery by Unifying roof vertex prompts, building segmentation, and offset learning within a transformer-based framework. It introduces Self Offset Attention (SOFA) based on Nadaraya-Watson regression to reconcile direction-length discrepancies between long and short offsets, and explores multi-solution strategies that combine masks and offsets for improved footprints. The approach yields direct polygon outputs without post-processing, demonstrates strong generalization across BONAI, OmniCity-view3, and Huizhou, and provides ablations validating the value of HQ mask prompts, vertex tokens, and multi-source prompting. The work advances robust, precise, and automatic footprint extraction in oblique imagery, with public release of offset-prediction weights to spur further research.

Abstract

Extracting polygonal building footprints from off-nadir imagery is crucial for diverse applications. Current deep-learning-based extraction approaches predominantly rely on semantic segmentation paradigms and post-processing algorithms, limiting their boundary precision and applicability. However, existing polygonal extraction methodologies are inherently designed for near-nadir imagery and fail under the geometric complexities introduced by off-nadir viewing angles. To address these challenges, this paper introduces Polygonal Footprint Network (PolyFootNet), a novel deep-learning framework that directly outputs polygonal building footprints without requiring external post-processing steps. PolyFootNet employs a High-Quality Mask Prompter to generate precise roof masks, which guide polygonal vertex extraction in a unified model pipeline. A key contribution of PolyFootNet is introducing the Self Offset Attention mechanism, grounded in Nadaraya-Watson regression, to effectively mitigate the accuracy discrepancy observed between low-rise and high-rise buildings. This approach allows low-rise building predictions to leverage angular corrections learned from high-rise building offsets, significantly enhancing overall extraction accuracy. Additionally, motivated by the inherent ambiguity of building footprint extraction tasks, we systematically investigate alternative extraction paradigms and demonstrate that a combined approach of building masks and offsets achieves superior polygonal footprint results. Extensive experiments validate PolyFootNet's effectiveness, illustrating its promising potential as a robust, generalizable, and precise polygonal building footprint extraction method from challenging off-nadir imagery. To facilitate further research, we will release pre-trained weights of our offset prediction module at https://github.com/likaiucas/PolyFootNet.

PolyFootNet: Extracting Polygonal Building Footprints in Off-Nadir Remote Sensing Images

TL;DR

PolyFootNet tackles polygonal building footprint extraction in challenging off-nadir imagery by Unifying roof vertex prompts, building segmentation, and offset learning within a transformer-based framework. It introduces Self Offset Attention (SOFA) based on Nadaraya-Watson regression to reconcile direction-length discrepancies between long and short offsets, and explores multi-solution strategies that combine masks and offsets for improved footprints. The approach yields direct polygon outputs without post-processing, demonstrates strong generalization across BONAI, OmniCity-view3, and Huizhou, and provides ablations validating the value of HQ mask prompts, vertex tokens, and multi-source prompting. The work advances robust, precise, and automatic footprint extraction in oblique imagery, with public release of offset-prediction weights to spur further research.

Abstract

Extracting polygonal building footprints from off-nadir imagery is crucial for diverse applications. Current deep-learning-based extraction approaches predominantly rely on semantic segmentation paradigms and post-processing algorithms, limiting their boundary precision and applicability. However, existing polygonal extraction methodologies are inherently designed for near-nadir imagery and fail under the geometric complexities introduced by off-nadir viewing angles. To address these challenges, this paper introduces Polygonal Footprint Network (PolyFootNet), a novel deep-learning framework that directly outputs polygonal building footprints without requiring external post-processing steps. PolyFootNet employs a High-Quality Mask Prompter to generate precise roof masks, which guide polygonal vertex extraction in a unified model pipeline. A key contribution of PolyFootNet is introducing the Self Offset Attention mechanism, grounded in Nadaraya-Watson regression, to effectively mitigate the accuracy discrepancy observed between low-rise and high-rise buildings. This approach allows low-rise building predictions to leverage angular corrections learned from high-rise building offsets, significantly enhancing overall extraction accuracy. Additionally, motivated by the inherent ambiguity of building footprint extraction tasks, we systematically investigate alternative extraction paradigms and demonstrate that a combined approach of building masks and offsets achieves superior polygonal footprint results. Extensive experiments validate PolyFootNet's effectiveness, illustrating its promising potential as a robust, generalizable, and precise polygonal building footprint extraction method from challenging off-nadir imagery. To facilitate further research, we will release pre-trained weights of our offset prediction module at https://github.com/likaiucas/PolyFootNet.
Paper Structure (32 sections, 12 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 32 sections, 12 equations, 8 figures, 9 tables, 1 algorithm.

Figures (8)

  • Figure 1: Previous approaches primarily relied on Paradigm (a) for extracting building footprints. In this study, we explore the extraction of building footprints using task decomposition based on Paradigms (b) and (c). Additionally, this work achieves the first implementation of Paradigm (d), enabling the direct extraction of polygonal footprints by the model via decoding offset tokens and vertex tokens. Compared to Paradigms (a), (b), and (c), which depend on post-processing algorithms to extract final results, the proposed method eliminates this dependency.
  • Figure 2: This figure illustrates the main structures of PolyFootNet and SOFA. In (a), PolyFootNet's newly added Proposal Networks allow the model to extract buildings automatically. In the prompt level, a roof vertex task was added, and the model can directly compute the location of the roof vertex. The footprint polygon is calculated directly on the coordinate. In (b), we provide a detailed SOFA Block. Once outputted from the Feed Forward Network (FFN), the encoded offset will be fed to SOFA. Then, adjusted offsets will be passed to offset coders and compute the final output offsets.
  • Figure 3: (a) describes the predicted roof and building for one building. (b) displays the critical condition of regressing building offset (c) abstracts the situation of (b).
  • Figure 4: Main results extracted by prompting mode. The green lines represent predicted building footprint boundaries, and the yellow points are key nodes of the building footprints.
  • Figure 5: Extracting footprints with multi-solutions of BFE.
  • ...and 3 more figures