Table of Contents
Fetching ...

Hide and Seek: Investigating Redundancy in Earth Observation Imagery

Tasos Papazafeiropoulos, Nikolaos Ioannis Bountos, Nikolas Papadopoulos, Ioannis Papoutsis

Abstract

The growing availability of Earth Observation (EO) data and recent advances in Computer Vision have driven rapid progress in machine learning for EO, producing domain-specific models at ever-increasing scales. Yet this progress risks overlooking fundamental properties of EO data that distinguish it from other domains. We argue that EO data exhibit a multidimensional redundancy (spectral, temporal, spatial, and semantic) which has a more pronounced impact on the domain and its applications than what current literature reflects. To validate this hypothesis, we conduct a systematic domain-specific investigation examining the existence, consistency, and practical implications of this phenomenon across key dimensions of EO variability. Our findings confirm that redundancy in EO data is both substantial and pervasive: exploiting it yields comparable performance ($\approx98.5\%$ of baseline) at a fraction of the computational cost ($\approx4\times$ fewer GFLOPs), at both training and inference. Crucially, these gains are consistent across tasks, geospatial locations, sensors, ground sampling distances, and architectural designs; suggesting that multi-faceted redundancy is a structural property of EO data rather than an artifact of specific experimental choices. These results lay the groundwork for more efficient, scalable, and accessible large-scale EO models.

Hide and Seek: Investigating Redundancy in Earth Observation Imagery

Abstract

The growing availability of Earth Observation (EO) data and recent advances in Computer Vision have driven rapid progress in machine learning for EO, producing domain-specific models at ever-increasing scales. Yet this progress risks overlooking fundamental properties of EO data that distinguish it from other domains. We argue that EO data exhibit a multidimensional redundancy (spectral, temporal, spatial, and semantic) which has a more pronounced impact on the domain and its applications than what current literature reflects. To validate this hypothesis, we conduct a systematic domain-specific investigation examining the existence, consistency, and practical implications of this phenomenon across key dimensions of EO variability. Our findings confirm that redundancy in EO data is both substantial and pervasive: exploiting it yields comparable performance ( of baseline) at a fraction of the computational cost ( fewer GFLOPs), at both training and inference. Crucially, these gains are consistent across tasks, geospatial locations, sensors, ground sampling distances, and architectural designs; suggesting that multi-faceted redundancy is a structural property of EO data rather than an artifact of specific experimental choices. These results lay the groundwork for more efficient, scalable, and accessible large-scale EO models.
Paper Structure (19 sections, 1 equation, 10 figures, 2 tables, 3 algorithms)

This paper contains 19 sections, 1 equation, 10 figures, 2 tables, 3 algorithms.

Figures (10)

  • Figure 1: Overview of RViT and RViT-UperNet for classification and semantic segmentation.
  • Figure 2: Examination of redundancy reduction during training across datasets. The x-axis indicates the retention ratios. For Thresholded-diversity based masking strategy we show the corresponding similarity thresholds resulting, on average, in similar retention ratios.
  • Figure 3: Examination of redundancy reduction directly on inference, for models trained with masking and without. We group models trained with the same retention ratio on the x-axis. We distinguish the inference retention ratio with color.
  • Figure 4: Examination of GSD impact on the robustness of redundancy reduction for MLRSNet and Flair.
  • Figure 5: Linear probing of redundancy-aware models trained with varying retention ratios, to assess their generalization capacity.
  • ...and 5 more figures