Impact Assessment of Missing Data in Model Predictions for Earth Observation Applications
Francisco Mena, Diego Arenas, Marcela Charfuelan, Marlon Nuske, Andreas Dengel
TL;DR
This paper investigates how missing data from multiple Earth Observation views affects predictive performance across four datasets involving classification and regression tasks. It evaluates several fusion strategies (Impute, Exemplar, Ignore, Ensemble) under progressively severe missing-view scenarios to quantify robustness and identify reliable approaches. Key findings show that the optical view is particularly critical, regression tasks are more sensitive than classification, and ensemble-based aggregation often offers the best robustness, sometimes approaching or achieving full robustness. The work provides practical recommendations for model selection under missing-view conditions and highlights directions for designing training-time adaptations to improve resilience to data unavailability in EO pipelines.
Abstract
Earth observation (EO) applications involving complex and heterogeneous data sources are commonly approached with machine learning models. However, there is a common assumption that data sources will be persistently available. Different situations could affect the availability of EO sources, like noise, clouds, or satellite mission failures. In this work, we assess the impact of missing temporal and static EO sources in trained models across four datasets with classification and regression tasks. We compare the predictive quality of different methods and find that some are naturally more robust to missing data. The Ensemble strategy, in particular, achieves a prediction robustness up to 100%. We evidence that missing scenarios are significantly more challenging in regression than classification tasks. Finally, we find that the optical view is the most critical view when it is missing individually.
