Table of Contents
Fetching ...

Probabilistic Machine Learning for Noisy Labels in Earth Observation

Spyros Kondylatos, Nikolaos Ioannis Bountos, Ioannis Prapas, Angelos Zavras, Gustau Camps-Valls, Ioannis Papoutsis

TL;DR

This study tackles label noise in Earth Observation by adopting a probabilistic ML framework that models input-dependent label noise and yields aleatoric uncertainty estimates. The approach places a noise-augmented latent term in the logits and uses MC sampling with a tempered softmax to produce predictions and per-sample uncertainties, with the network learning both mean and noise terms. Across four EO applications (LULC, landslides, volcanic activity, and wildfires), the probabilistic models generally improve performance and provide reliable uncertainty footprints, validated via Discard Tests and uncertainty-density analyses. The findings argue for uncertainty-aware EO ML as a route to more trustworthy, interpretable, and decision-supportive remote sensing systems, while noting the current focus on aleatoric uncertainty and plans to extend to epistemic uncertainty in future work.

Abstract

Label noise poses a significant challenge in Earth Observation (EO), often degrading the performance and reliability of supervised Machine Learning (ML) models. Yet, given the critical nature of several EO applications, developing robust and trustworthy ML solutions is essential. In this study, we take a step in this direction by leveraging probabilistic ML to model input-dependent label noise and quantify data uncertainty in EO tasks, accounting for the unique noise sources inherent in the domain. We train uncertainty-aware probabilistic models across a broad range of high-impact EO applications-spanning diverse noise sources, input modalities, and ML configurations-and introduce a dedicated pipeline to assess their accuracy and reliability. Our experimental results show that the uncertainty-aware models consistently outperform the standard deterministic approaches across most datasets and evaluation metrics. Moreover, through rigorous uncertainty evaluation, we validate the reliability of the predicted uncertainty estimates, enhancing the interpretability of model predictions. Our findings emphasize the importance of modeling label noise and incorporating uncertainty quantification in EO, paving the way for more accurate, reliable, and trustworthy ML solutions in the field.

Probabilistic Machine Learning for Noisy Labels in Earth Observation

TL;DR

This study tackles label noise in Earth Observation by adopting a probabilistic ML framework that models input-dependent label noise and yields aleatoric uncertainty estimates. The approach places a noise-augmented latent term in the logits and uses MC sampling with a tempered softmax to produce predictions and per-sample uncertainties, with the network learning both mean and noise terms. Across four EO applications (LULC, landslides, volcanic activity, and wildfires), the probabilistic models generally improve performance and provide reliable uncertainty footprints, validated via Discard Tests and uncertainty-density analyses. The findings argue for uncertainty-aware EO ML as a route to more trustworthy, interpretable, and decision-supportive remote sensing systems, while noting the current focus on aleatoric uncertainty and plans to extend to epistemic uncertainty in future work.

Abstract

Label noise poses a significant challenge in Earth Observation (EO), often degrading the performance and reliability of supervised Machine Learning (ML) models. Yet, given the critical nature of several EO applications, developing robust and trustworthy ML solutions is essential. In this study, we take a step in this direction by leveraging probabilistic ML to model input-dependent label noise and quantify data uncertainty in EO tasks, accounting for the unique noise sources inherent in the domain. We train uncertainty-aware probabilistic models across a broad range of high-impact EO applications-spanning diverse noise sources, input modalities, and ML configurations-and introduce a dedicated pipeline to assess their accuracy and reliability. Our experimental results show that the uncertainty-aware models consistently outperform the standard deterministic approaches across most datasets and evaluation metrics. Moreover, through rigorous uncertainty evaluation, we validate the reliability of the predicted uncertainty estimates, enhancing the interpretability of model predictions. Our findings emphasize the importance of modeling label noise and incorporating uncertainty quantification in EO, paving the way for more accurate, reliable, and trustworthy ML solutions in the field.

Paper Structure

This paper contains 28 sections, 13 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: (A) The categorization of Label Noise Sources (NS) in Earth Observation (EO). (B) The uncertainty-aware Machine Learning (ML) pipeline for modeling label noise in EO that is proposed in this study. An uncertainty-aware ML model is used to model the label noise in EO. A Normal distribution is induced in the logits of a Neural Network, and the model is trained to predict its mean $f_{c}^{w}(x)$ as the output and $\sigma_{c}^{w}(x)$ as its heteroscedastic uncertainty. Monte Carlo (MC) sampling is used to generate multiple samples from this distribution, enabling the estimation of the final model prediction (mean of the samples) and its associated uncertainty (variance of the samples). A temperature parameter $\tau$ is used, scaling the logits for a tempered softmax calculation. Model performance is assessed using standard evaluation methods ($F_1$ score and Area Under Precision-Recall Curve (AUPRC)), while uncertainty estimates are assessed using dedicated uncertainty evaluation methods (Discard Test and Uncertainty Density plots) and visualizations.
  • Figure 2: Examples of data samples from the datasets used in this study, highlighting sources of label noise. (A) BigEarthNet: The discrepancies in labeling strategies of the Corine Land Cover database introduce label noise. (B) Landslides dataset: Misalignments between in-situ annotated masks and earth observation images contribute to labeling inconsistencies. (C) Hephaestus: Atmospheric contributions and coherence variations challenge the annotation, leading to labels with heteroscedastic noise. (D) Wildfires dataset: Label noise stems from the stochastic nature of wildfire occurrence, where similar environmental conditions do not always lead to the same target class.
  • Figure 3: Discard test plots across all tasks. A reliable model should exhibit a decreasing error trend as the discard fraction increases, indicating that most uncertain samples correspond to higher loss values. The Monotonicity Fraction (MF) measures the frequency with which the error decreases upon discarding uncertain samples, while the Discard Improvement (DI) quantifies the average reduction in model error as the discard fraction increases. LULC refers to Land Use Land Cover.
  • Figure 4: Uncertainty density plots across all tasks, presented for all classes combined and separately for the positive and negative classes. Vertical dashed lines indicate the median uncertainty for each group. Reliable uncertainty estimates are characterized by distinct, well-separated distributions with minimal overlap. LULC refers to Land Use Land Cover.
  • Figure 5: Samples with low and high uncertainty for the volcanic activity detection and land use land cover scene classification tasks.
  • ...and 2 more figures