Table of Contents
Fetching ...

Uncertainty-aware Bayesian machine learning modelling of land cover classification

Samuel Bilson, Anna Pustogvar

TL;DR

The paper addresses the lack of input measurement uncertainty in remote-sensing land cover classification and argues for uncertainty-aware probabilistic predictions to support traceability. It introduces an uncertainty-aware Bayesian framework based on an Errors-in-Variables approach and Bayesian quadratic discriminant analysis, modeling class-conditional inputs as $p(\pmb{\chi}|\pmb{\theta}_k)$ with Gaussian measurement error $p(\mathbf{x}|\pmb{\chi},\pmb{\zeta})$ and using a Normal-Inverse-Wishart prior to obtain a closed-form posterior predictive. In experiments on Sentinel-2 based land cover datasets, Bayesian QDA provides interpretable class statistics, robust performance across years and data sizes, and favorable computational efficiency compared with RF and NN. The results support metrology-grade traceability by delivering calibrated class probabilities and explicit handling of input uncertainty, with future avenues exploring non-Gaussian uncertainties and label uncertainty.

Abstract

Land cover classification involves the production of land cover maps, which determine the type of land through remote sensing imagery. Over recent years, such classification is being performed by machine learning classification models, which can give highly accurate predictions on land cover per pixel using large quantities of input training data. However, such models do not currently take account of input measurement uncertainty, which is vital for traceability in metrology. In this work we propose a Bayesian classification framework using generative modelling to take account of input measurement uncertainty. We take the specific case of Bayesian quadratic discriminant analysis, and apply it to land cover datasets from Copernicus Sentinel-2 in 2020 and 2021. We benchmark the performance of the model against more popular classification models used in land cover maps such as random forests and neural networks. We find that such Bayesian models are more trustworthy, in the sense that they are more interpretable, explicitly model the input measurement uncertainty, and maintain predictive performance of class probability outputs across datasets of different years and sizes, whilst also being computationally efficient.

Uncertainty-aware Bayesian machine learning modelling of land cover classification

TL;DR

The paper addresses the lack of input measurement uncertainty in remote-sensing land cover classification and argues for uncertainty-aware probabilistic predictions to support traceability. It introduces an uncertainty-aware Bayesian framework based on an Errors-in-Variables approach and Bayesian quadratic discriminant analysis, modeling class-conditional inputs as with Gaussian measurement error and using a Normal-Inverse-Wishart prior to obtain a closed-form posterior predictive. In experiments on Sentinel-2 based land cover datasets, Bayesian QDA provides interpretable class statistics, robust performance across years and data sizes, and favorable computational efficiency compared with RF and NN. The results support metrology-grade traceability by delivering calibrated class probabilities and explicit handling of input uncertainty, with future avenues exploring non-Gaussian uncertainties and label uncertainty.

Abstract

Land cover classification involves the production of land cover maps, which determine the type of land through remote sensing imagery. Over recent years, such classification is being performed by machine learning classification models, which can give highly accurate predictions on land cover per pixel using large quantities of input training data. However, such models do not currently take account of input measurement uncertainty, which is vital for traceability in metrology. In this work we propose a Bayesian classification framework using generative modelling to take account of input measurement uncertainty. We take the specific case of Bayesian quadratic discriminant analysis, and apply it to land cover datasets from Copernicus Sentinel-2 in 2020 and 2021. We benchmark the performance of the model against more popular classification models used in land cover maps such as random forests and neural networks. We find that such Bayesian models are more trustworthy, in the sense that they are more interpretable, explicitly model the input measurement uncertainty, and maintain predictive performance of class probability outputs across datasets of different years and sizes, whilst also being computationally efficient.

Paper Structure

This paper contains 25 sections, 43 equations, 10 figures.

Figures (10)

  • Figure 1: Labelled data collected over the area of interest (AOI) superimposed over Sentinel-2 RGB image from June 1st, 2020
  • Figure 2: Distribution of LC classes within AOI shown in Figure \ref{['fig:AOIandTD']}.
  • Figure 3: One realisation of dimensionally reduced LC map data using PCA.
  • Figure 4: Bar charts of input BOA reflectance data statistics, with sample mean $\Bar{x}$ and sample standard deviation $s_x$ for each reflectance band and LC class. Error bars show $\Bar{x}\pm2s_x$, using all realisations of training data, over years 2020 and 2021.
  • Figure 5: Confusion matrices of models trained using 10 % of pixels from year 2020 and validated on the remaining 90 % of pixels from year 2021. True/false positive rates (bottom rows) and true/false negative rates (right columns) are also included.
  • ...and 5 more figures