PolyFootNet: Extracting Polygonal Building Footprints in Off-Nadir Remote Sensing Images
Kai Li, Yupeng Deng, Jingbo Chen, Yu Meng, Zhihao Xi, Junxian Ma, Chenhao Wang, Maolin Wang, Xiangyu Zhao
TL;DR
PolyFootNet tackles polygonal building footprint extraction in challenging off-nadir imagery by Unifying roof vertex prompts, building segmentation, and offset learning within a transformer-based framework. It introduces Self Offset Attention (SOFA) based on Nadaraya-Watson regression to reconcile direction-length discrepancies between long and short offsets, and explores multi-solution strategies that combine masks and offsets for improved footprints. The approach yields direct polygon outputs without post-processing, demonstrates strong generalization across BONAI, OmniCity-view3, and Huizhou, and provides ablations validating the value of HQ mask prompts, vertex tokens, and multi-source prompting. The work advances robust, precise, and automatic footprint extraction in oblique imagery, with public release of offset-prediction weights to spur further research.
Abstract
Extracting polygonal building footprints from off-nadir imagery is crucial for diverse applications. Current deep-learning-based extraction approaches predominantly rely on semantic segmentation paradigms and post-processing algorithms, limiting their boundary precision and applicability. However, existing polygonal extraction methodologies are inherently designed for near-nadir imagery and fail under the geometric complexities introduced by off-nadir viewing angles. To address these challenges, this paper introduces Polygonal Footprint Network (PolyFootNet), a novel deep-learning framework that directly outputs polygonal building footprints without requiring external post-processing steps. PolyFootNet employs a High-Quality Mask Prompter to generate precise roof masks, which guide polygonal vertex extraction in a unified model pipeline. A key contribution of PolyFootNet is introducing the Self Offset Attention mechanism, grounded in Nadaraya-Watson regression, to effectively mitigate the accuracy discrepancy observed between low-rise and high-rise buildings. This approach allows low-rise building predictions to leverage angular corrections learned from high-rise building offsets, significantly enhancing overall extraction accuracy. Additionally, motivated by the inherent ambiguity of building footprint extraction tasks, we systematically investigate alternative extraction paradigms and demonstrate that a combined approach of building masks and offsets achieves superior polygonal footprint results. Extensive experiments validate PolyFootNet's effectiveness, illustrating its promising potential as a robust, generalizable, and precise polygonal building footprint extraction method from challenging off-nadir imagery. To facilitate further research, we will release pre-trained weights of our offset prediction module at https://github.com/likaiucas/PolyFootNet.
