Towards Efficient Disaster Response via Cost-effective Unbiased Class Rate Estimation through Neyman Allocation Stratified Sampling Active Learning
Yanbing Bai, Xinyi Wu, Lai Xu, Jihan Pei, Erick Mas, Shunichi Koshimura
TL;DR
The paper tackles the challenge of estimating disaster-related class rates under label scarcity by integrating Neyman allocation stratified sampling with active learning. The proposed NSRS framework uses a stratified tree to adaptively partition the data and applies Neyman allocation to minimize the variance of the overall class-rate estimate while supporting iterative model improvement. Empirical results across MNIST, CIFAR-10, CINIC-10, and the xBD disaster dataset show NSRS achieves unbiased class-rate estimates with substantially lower variance (about 30-60% reduction vs simple sampling) and improves model performance, particularly as model complexity grows, while reducing labeling costs. The work demonstrates practical value for rapid disaster assessment and highlights potential for fusion with uncertainty-based strategies to further enhance performance and robustness in label-constrained settings.
Abstract
With the rapid development of earth observation technology, we have entered an era of massively available satellite remote-sensing data. However, a large amount of satellite remote sensing data lacks a label or the label cost is too high to hinder the potential of AI technology mining satellite data. Especially in such an emergency response scenario that uses satellite data to evaluate the degree of disaster damage. Disaster damage assessment encountered bottlenecks due to excessive focus on the damage of a certain building in a specific geographical space or a certain area on a larger scale. In fact, in the early days of disaster emergency response, government departments were more concerned about the overall damage rate of the disaster area instead of single-building damage, because this helps the government decide the level of emergency response. We present an innovative algorithm that constructs Neyman stratified random sampling trees for binary classification and extends this approach to multiclass problems. Through extensive experimentation on various datasets and model structures, our findings demonstrate that our method surpasses both passive and conventional active learning techniques in terms of class rate estimation and model enhancement with only 30\%-60\% of the annotation cost of simple sampling. It effectively addresses the 'sampling bias' challenge in traditional active learning strategies and mitigates the 'cold start' dilemma. The efficacy of our approach is further substantiated through application to disaster evaluation tasks using Xview2 Satellite imagery, showcasing its practical utility in real-world contexts.
