Scale-Aware Recognition in Satellite Images under Resource Constraints
Shreelekha Revankar, Cheng Perng Phoo, Utkarsh Mall, Bharath Hariharan, Kavita Bala
TL;DR
The paper tackles scale-aware recognition in satellite imagery under fixed acquisition budgets by marrying three components: (i) an LLM-based approach to infer the optimal resolution for each concept, (ii) knowledge distillation from high-resolution (HR) to low-resolution (LR) models to enable finer recognition with LR data, and (iii) a disagreement-driven strategy to selectively acquire HR imagery where it yields the most benefit. The proposed system retrieves concepts by switching between LR-only inference, HR-based evaluation, and LR-based KD, guided by budget constraints and predicted concept scale, achieving up to 26.3% relative improvement over HR-only baselines while using far fewer HR images. Experiments on Sentinel-2–NAIP and Sentinel-2–NICFI benchmarks show that the approach outperforms baselines in both zero-shot and supervised settings, with strong LR performance thanks to distillation and targeted HR sampling. The work demonstrates practical, scalable recognition across diverse geographic regions and modalities, offering a cost-effective framework for large-scale, open-vocabulary satellite imagery analysis.
Abstract
Recognition of features in satellite imagery (forests, swimming pools, etc.) depends strongly on the spatial scale of the concept and therefore the resolution of the images. This poses two challenges: Which resolution is best suited for recognizing a given concept, and where and when should the costlier higher-resolution (HR) imagery be acquired? We present a novel scheme to address these challenges by introducing three components: (1) A technique to distill knowledge from models trained on HR imagery to recognition models that operate on imagery of lower resolution (LR), (2) a sampling strategy for HR imagery based on model disagreement, and (3) an LLM-based approach for inferring concept "scale". With these components we present a system to efficiently perform scale-aware recognition in satellite imagery, improving accuracy over single-scale inference while following budget constraints. Our novel approach offers up to a 26.3% improvement over entirely HR baselines, using 76.3% fewer HR images.
