Hierarchical Semi-Supervised Active Learning for Remote Sensing
Wei Huang, Zhitong Xiong, Chenying Liu, Xiao Xiang Zhu
TL;DR
The paper tackles label scarcity in remote sensing by integrating semi-supervised learning with a novel hierarchical active learning strategy in an iterative loop. It introduces HAL to achieve scalable, diverse, and uncertainty-aware sample querying, while SSL (via weak-to-strong self-training) expands the effective training set using unlabeled data. On UCM, AID, and NWPU-RESISC45, HSSAL consistently outperforms SSL-only and AL-only baselines, achieving around 95% of fully-supervised accuracy with as little as 2–8% labeled data, demonstrating strong label efficiency. The approach leverages a DINOv2-based encoder, gradient-based uncertainty, and spectral clustering to efficiently explore data manifolds, suggesting broad applicability to RS tasks and potential extension to dense prediction problems.
Abstract
The performance of deep learning models in remote sensing (RS) strongly depends on the availability of high-quality labeled data. However, collecting large-scale annotations is costly and time-consuming, while vast amounts of unlabeled imagery remain underutilized. To address this challenge, we propose a Hierarchical Semi-Supervised Active Learning (HSSAL) framework that integrates semi-supervised learning (SSL) and a novel hierarchical active learning (HAL) in a closed iterative loop. In each iteration, SSL refines the model using both labeled data through supervised learning and unlabeled data via weak-to-strong self-training, improving feature representation and uncertainty estimation. Guided by the refined representations and uncertainty cues of unlabeled samples, HAL then conducts sample querying through a progressive clustering strategy, selecting the most informative instances that jointly satisfy the criteria of scalability, diversity, and uncertainty. This hierarchical process ensures both efficiency and representativeness in sample selection. Extensive experiments on three benchmark RS scene classification datasets, including UCM, AID, and NWPU-RESISC45, demonstrate that HSSAL consistently outperforms SSL- or AL-only baselines. Remarkably, with only 8%, 4%, and 2% labeled training data on UCM, AID, and NWPU-RESISC45, respectively, HSSAL achieves over 95% of fully-supervised accuracy, highlighting its superior label efficiency through informativeness exploitation of unlabeled data. Our code will be publicly available.
