Efficient Building Roof Type Classification: A Domain-Specific Self-Supervised Approach
Guneet Mutreja, Ksenia Bittner
TL;DR
The paper addresses roof-type classification from aerial imagery under label scarcity by employing a domain-specific self-supervised framework built on EfficientNet backbones augmented with CBAM. It demonstrates that pretraining on the domain-specific AID dataset, combined with contrastive self-supervised methods such as SimCLR, yields high validation accuracy (up to 95.5%) that rivals transformer-based models while using far fewer parameters. The approach generalizes well to unseen test sets (Braunschweig and Roof Graph) and delivers strong computational efficiency compared with ResNet baselines. Overall, the work presents a practical, data-efficient paradigm for remote-sensing roof classification that leverages domain-specific pretraining and attention-enhanced networks to perform well with limited labeled data.
Abstract
Accurate classification of building roof types from aerial imagery is crucial for various remote sensing applications, including urban planning, disaster management, and infrastructure monitoring. However, this task is often hindered by the limited availability of labeled data for supervised learning approaches. To address this challenge, this paper investigates the effectiveness of self supervised learning with EfficientNet architectures, known for their computational efficiency, for building roof type classification. We propose a novel framework that incorporates a Convolutional Block Attention Module (CBAM) to enhance the feature extraction capabilities of EfficientNet. Furthermore, we explore the benefits of pretraining on a domain-specific dataset, the Aerial Image Dataset (AID), compared to ImageNet pretraining. Our experimental results demonstrate the superiority of our approach. Employing Simple Framework for Contrastive Learning of Visual Representations (SimCLR) with EfficientNet-B3 and CBAM achieves a 95.5% accuracy on our validation set, matching the performance of state-of-the-art transformer-based models while utilizing significantly fewer parameters. We also provide a comprehensive evaluation on two challenging test sets, demonstrating the generalization capability of our method. Notably, our findings highlight the effectiveness of domain-specific pretraining, consistently leading to higher accuracy compared to models pretrained on the generic ImageNet dataset. Our work establishes EfficientNet based self-supervised learning as a computationally efficient and highly effective approach for building roof type classification, particularly beneficial in scenarios with limited labeled data.
