Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning

Weijia Li; Jinhua Yu; Dairong Chen; Yi Lin; Runmin Dong; Xiang Zhang; Conghui He; Haohuan Fu

Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning

Weijia Li, Jinhua Yu, Dairong Chen, Yi Lin, Runmin Dong, Xiang Zhang, Conghui He, Haohuan Fu

TL;DR

This paper tackles fine-grained building function recognition from street-view imagery by integrating GIS data through a geometry-aware, three-stage semi-supervised framework. It separates the learning into online facade pre-training, offline coarse-annotation generation via cross-view geometry, and online recognition using coarse labels, enabling large-scale, cross-city deployment with limited annotations. The approach yields notable improvements over fully-supervised and existing semi-supervised methods and demonstrates robustness in cross-regional transfer, offering a practical path for scalable urban analytics with reduced labeling requirements. Overall, the method advances multi-city urban understanding by effectively fusing top-down GIS semantics with ground-level street-view observations.

Abstract

In this work, we propose a geometry-aware semi-supervised framework for fine-grained building function recognition, utilizing geometric relationships among multi-source data to enhance pseudo-label accuracy in semi-supervised learning, broadening its applicability to various building function categorization systems. Firstly, we design an online semi-supervised pre-training stage, which facilitates the precise acquisition of building facade location information in street-view images. In the second stage, we propose a geometry-aware coarse annotation generation module. This module effectively combines GIS data and street-view data based on the geometric relationships, improving the accuracy of pseudo annotations. In the third stage, we combine the newly generated coarse annotations with the existing labeled dataset to achieve fine-grained functional recognition of buildings across multiple cities at a large scale. Extensive experiments demonstrate that our proposed framework exhibits superior performance in fine-grained functional recognition of buildings. Within the same categorization system, it achieves improvements of 7.6\% and 4.8\% compared to fully-supervised methods and state-of-the-art semi-supervised methods, respectively. Additionally, our method also performs well in cross-city scenarios, i.e., extending the model trained on OmniCity (New York) to new cities (i.e., Los Angeles and Boston) with different building function categorization systems. This study offers a new solution for large-scale multi-city applications with minimal annotation requirements, facilitating more efficient data updates and resource allocation in urban management.

Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning

TL;DR

Abstract

Paper Structure (29 sections, 9 equations, 9 figures, 7 tables)

This paper contains 29 sections, 9 equations, 9 figures, 7 tables.

Introduction
Related work
Fine-grained building attributes recognition based on street-view images
Semi-supervised learning
Datasets
Street-view dataset
Building function and footprint information
Street-view image annotation
Methods
Overall framework
Online pre-training for building facade recognition
Semi-supervised network architecture
Semi-supervised network training
Offline building function annotation generation
Cross-view data transformation
...and 14 more sections

Figures (9)

Figure 1: An overview of the proposed geometry-aware semi-supervised framework, which include: (a) An online semi-supervised object detection framework for building facade detection, (b) a brand new module for coarse annotation generation based on GIS data and street-view images, (c) One-stage-based building function recognition.
Figure 2: The overview of the datasets used in this paper. (a) The data for this study primarily come from three cities in the United States, including New York (OmniCity li2023omnicity), Los Angeles, and Boston. (b) Street-view images from the New York all include pixel-level annotations of fine-grained building functions, while the Los Angeles and Boston areas are entirely new regions with only a few annotations available. (c) GIS data consist of building function information and footprint information, where the attribute information comes from datasets publicly available from various governments, and the building footprint information is sourced from OSM.
Figure 3: Coarse annotation generation module. (a) Fine-grained building function extraction from GIS data based on ray-tracing method, (b) Coarse annotation generation and filtering based on angular relationships.
Figure 4: Visual comparisons of our method and fully-supervised methods on the fine-grained building function recognition task with the same labeled data. (a) - (c) are respectively the visual results of RetinaNet, FCOS, and GFL, which utilize the one-stage detection strategy. (d) - (f) are respectively the visual results of Faster R-CNN, Cascade R-CNN, and Dynamic R-CNN, which apply the two-stage detection strategy. (g) and (h) represent the results of our method and the ground truth, respectively.
Figure 5: Visual comparisons of our method and state-of-the-art semi-supervised methods on the fine-grained building functions recognition task with $1:10$ labeled-unlabeled ratio. (a) - (e) are respectively the visual results of Mean-teacher, Soft-teacher, PseCo, ARSL, and Consistent-teacher, while (f) depicts the results of our method. (g) represents the ground truth.
...and 4 more figures

Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning

TL;DR

Abstract

Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)