Table of Contents
Fetching ...

BFA-YOLO: A balanced multiscale object detection network for building façade attachments detection

Yangguang Chen, Tong Wang, Guanzhou Chen, Kun Zhu, Xiaoliang Tan, Jiaqi Wang, Wenchao Guo, Qing Wang, Xiaolong Luo, Xiaodong Zhang

TL;DR

The paper tackles reliable detection of building façade attachments in urban environments, where objects are unevenly distributed and often small amidst cluttered backgrounds. It introduces BFA-YOLO, a YOLOv8-based detector augmented with three novel modules—Feature Balanced Spindle Module (FBSM), Target Dynamic Alignment Task Detection Head (TDATH), and Position Memory Enhanced Self-Attention (PMESA)—and a new multi-view BFA-3D dataset of UAV-rendered facade images. The approach yields consistent improvements over baselines, with $AP_{50}$ gains of $1.8\%$ on BFA-3D and $2.9\%$ on Façade-WHU, and substantial small-object and background-noise handling as shown by ablations and qualitative analyses. By providing a richly annotated multi-view dataset and a robust detector tailored for façade attachments, the work advances automated BIM workflows and urban scene understanding, with potential impact on CityGML LOD3 compliance and downstream 3D modeling tasks.

Abstract

The detection of façade elements on buildings, such as doors, windows, balconies, air conditioning units, billboards, and glass curtain walls, is a critical step in automating the creation of Building Information Modeling (BIM). Yet, this field faces significant challenges, including the uneven distribution of façade elements, the presence of small objects, and substantial background noise, which hamper detection accuracy. To address these issues, we develop the BFA-YOLO model and the BFA-3D dataset in this study. The BFA-YOLO model is an advanced architecture designed specifically for analyzing multi-view images of façade attachments. It integrates three novel components: the Feature Balanced Spindle Module (FBSM) that tackles the issue of uneven object distribution; the Target Dynamic Alignment Task Detection Head (TDATH) that enhances the detection of small objects; and the Position Memory Enhanced Self-Attention Mechanism (PMESA), aimed at reducing the impact of background noise. These elements collectively enable BFA-YOLO to effectively address each challenge, thereby improving model robustness and detection precision. The BFA-3D dataset, offers multi-view images with precise annotations across a wide range of façade attachment categories. This dataset is developed to address the limitations present in existing façade detection datasets, which often feature a single perspective and insufficient category coverage. Through comparative analysis, BFA-YOLO demonstrated improvements of 1.8\% and 2.9\% in mAP$_{50}$ on the BFA-3D dataset and the public Façade-WHU dataset, respectively, when compared to the baseline YOLOv8 model. These results highlight the superior performance of BFA-YOLO in façade element detection and the advancement of intelligent BIM technologies.

BFA-YOLO: A balanced multiscale object detection network for building façade attachments detection

TL;DR

The paper tackles reliable detection of building façade attachments in urban environments, where objects are unevenly distributed and often small amidst cluttered backgrounds. It introduces BFA-YOLO, a YOLOv8-based detector augmented with three novel modules—Feature Balanced Spindle Module (FBSM), Target Dynamic Alignment Task Detection Head (TDATH), and Position Memory Enhanced Self-Attention (PMESA)—and a new multi-view BFA-3D dataset of UAV-rendered facade images. The approach yields consistent improvements over baselines, with gains of on BFA-3D and on Façade-WHU, and substantial small-object and background-noise handling as shown by ablations and qualitative analyses. By providing a richly annotated multi-view dataset and a robust detector tailored for façade attachments, the work advances automated BIM workflows and urban scene understanding, with potential impact on CityGML LOD3 compliance and downstream 3D modeling tasks.

Abstract

The detection of façade elements on buildings, such as doors, windows, balconies, air conditioning units, billboards, and glass curtain walls, is a critical step in automating the creation of Building Information Modeling (BIM). Yet, this field faces significant challenges, including the uneven distribution of façade elements, the presence of small objects, and substantial background noise, which hamper detection accuracy. To address these issues, we develop the BFA-YOLO model and the BFA-3D dataset in this study. The BFA-YOLO model is an advanced architecture designed specifically for analyzing multi-view images of façade attachments. It integrates three novel components: the Feature Balanced Spindle Module (FBSM) that tackles the issue of uneven object distribution; the Target Dynamic Alignment Task Detection Head (TDATH) that enhances the detection of small objects; and the Position Memory Enhanced Self-Attention Mechanism (PMESA), aimed at reducing the impact of background noise. These elements collectively enable BFA-YOLO to effectively address each challenge, thereby improving model robustness and detection precision. The BFA-3D dataset, offers multi-view images with precise annotations across a wide range of façade attachment categories. This dataset is developed to address the limitations present in existing façade detection datasets, which often feature a single perspective and insufficient category coverage. Through comparative analysis, BFA-YOLO demonstrated improvements of 1.8\% and 2.9\% in mAP on the BFA-3D dataset and the public Façade-WHU dataset, respectively, when compared to the baseline YOLOv8 model. These results highlight the superior performance of BFA-YOLO in façade element detection and the advancement of intelligent BIM technologies.
Paper Structure (20 sections, 1 equation, 13 figures, 7 tables)

This paper contains 20 sections, 1 equation, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Images rendering schematic. The pentagram represents the position of the analog camera, the six optic vertebrae represent different horizontal viewing angles, and the two small images on the far right represent two images of the same horizontal position with different vertical viewing angles.
  • Figure 2: Illustration of the multi-stage data labeling and verification process for the BFA-3D dataset.
  • Figure 3: The statistics of the object bounding boxes in the BFA-3D dataset. The (a) reflects the distribution of object positions in the image. The horizontal and vertical coordinates correspond to the ratio of the label center coordinates to the image width and height. The middle part of the image is darker in color, which indicates that the objects are mostly located in the middle of the image. The ratio of the size of the objects relative to the image is shown in the (b), with darker colors at the origin, which indicates that the dataset contains more small objects.
  • Figure 4: The network architecture of BFA-YOLO model. The bolded modules FBSM, TDATH, and PMESA in the figure are the new modules proposed in this paper.
  • Figure 5: The Feature Balanced Spindle Module (FBSM) structure diagram.
  • ...and 8 more figures