Table of Contents
Fetching ...

Scale-Aware Recognition in Satellite Images under Resource Constraints

Shreelekha Revankar, Cheng Perng Phoo, Utkarsh Mall, Bharath Hariharan, Kavita Bala

TL;DR

The paper tackles scale-aware recognition in satellite imagery under fixed acquisition budgets by marrying three components: (i) an LLM-based approach to infer the optimal resolution for each concept, (ii) knowledge distillation from high-resolution (HR) to low-resolution (LR) models to enable finer recognition with LR data, and (iii) a disagreement-driven strategy to selectively acquire HR imagery where it yields the most benefit. The proposed system retrieves concepts by switching between LR-only inference, HR-based evaluation, and LR-based KD, guided by budget constraints and predicted concept scale, achieving up to 26.3% relative improvement over HR-only baselines while using far fewer HR images. Experiments on Sentinel-2–NAIP and Sentinel-2–NICFI benchmarks show that the approach outperforms baselines in both zero-shot and supervised settings, with strong LR performance thanks to distillation and targeted HR sampling. The work demonstrates practical, scalable recognition across diverse geographic regions and modalities, offering a cost-effective framework for large-scale, open-vocabulary satellite imagery analysis.

Abstract

Recognition of features in satellite imagery (forests, swimming pools, etc.) depends strongly on the spatial scale of the concept and therefore the resolution of the images. This poses two challenges: Which resolution is best suited for recognizing a given concept, and where and when should the costlier higher-resolution (HR) imagery be acquired? We present a novel scheme to address these challenges by introducing three components: (1) A technique to distill knowledge from models trained on HR imagery to recognition models that operate on imagery of lower resolution (LR), (2) a sampling strategy for HR imagery based on model disagreement, and (3) an LLM-based approach for inferring concept "scale". With these components we present a system to efficiently perform scale-aware recognition in satellite imagery, improving accuracy over single-scale inference while following budget constraints. Our novel approach offers up to a 26.3% improvement over entirely HR baselines, using 76.3% fewer HR images.

Scale-Aware Recognition in Satellite Images under Resource Constraints

TL;DR

The paper tackles scale-aware recognition in satellite imagery under fixed acquisition budgets by marrying three components: (i) an LLM-based approach to infer the optimal resolution for each concept, (ii) knowledge distillation from high-resolution (HR) to low-resolution (LR) models to enable finer recognition with LR data, and (iii) a disagreement-driven strategy to selectively acquire HR imagery where it yields the most benefit. The proposed system retrieves concepts by switching between LR-only inference, HR-based evaluation, and LR-based KD, guided by budget constraints and predicted concept scale, achieving up to 26.3% relative improvement over HR-only baselines while using far fewer HR images. Experiments on Sentinel-2–NAIP and Sentinel-2–NICFI benchmarks show that the approach outperforms baselines in both zero-shot and supervised settings, with strong LR performance thanks to distillation and targeted HR sampling. The work demonstrates practical, scalable recognition across diverse geographic regions and modalities, offering a cost-effective framework for large-scale, open-vocabulary satellite imagery analysis.

Abstract

Recognition of features in satellite imagery (forests, swimming pools, etc.) depends strongly on the spatial scale of the concept and therefore the resolution of the images. This poses two challenges: Which resolution is best suited for recognizing a given concept, and where and when should the costlier higher-resolution (HR) imagery be acquired? We present a novel scheme to address these challenges by introducing three components: (1) A technique to distill knowledge from models trained on HR imagery to recognition models that operate on imagery of lower resolution (LR), (2) a sampling strategy for HR imagery based on model disagreement, and (3) an LLM-based approach for inferring concept "scale". With these components we present a system to efficiently perform scale-aware recognition in satellite imagery, improving accuracy over single-scale inference while following budget constraints. Our novel approach offers up to a 26.3% improvement over entirely HR baselines, using 76.3% fewer HR images.

Paper Structure

This paper contains 34 sections, 5 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: With these images we can see how concept scale is linked to spatial resolution. If we are seeking out a spatially large concept like forest, lower resolutions are favored (b), as higher resolutions may lack the needed context to discern between a forest (a) and a park (c). At the same time while seeking out finer concepts such as sports track, certain details can only be discerned well at higher resolutions (d) and are obscured at lower resolutions (e).
  • Figure 2: System overview. First, we determine which resolution is best suited for the search concept based on its scale (sec. \ref{['subsec:LLM']}). Then, we analyze the search area to find which regions would benefit the most from higher resolution inference (sec. \ref{['subsec:analysis']}). We sample the best suited regions while staying within a user specified budget. Based on this guidance we perform inference using one of three models, a high resolution satellite model, a low resolution satellite model, and a low resolution satellite model with knowledge distilled from its high resolution counterpart (sec. \ref{['subsec:KD']}). This knowledge distilled model allows us to infer finer details using low resolution satellite imagery alone.
  • Figure 3: Images ranked according to disagreement between the LR and HR model (top) and the LR and KD model (bottom). Both rankings are similar, with a correlation coefficient of 0.9322, even though the latter only uses LR images.
  • Figure 3: Performance of the unsupervised models for recognition at low resolution. Our knowledge-distilled models perform better on both seen and unseen concepts (corresponding HR models in grey; references included in Tab. \ref{['tab:overall_graft']}).
  • Figure 4: Performance when using our model disagreement-based sampling strategy. Our approach consistently yields higher precision across all budgets and across all values of $K$.