Table of Contents
Fetching ...

An Efficient Model-Agnostic Approach for Uncertainty Estimation in Data-Restricted Pedometric Applications

Viacheslav Barkov, Jonas Schmidinger, Robin Gebbers, Martin Atzmueller

TL;DR

This work tackles uncertainty estimation in pedometrics under data scarcity by transforming regression tasks into classification problems via a universal adapter. It enables the use of established classification models, notably TabPFN, and combines multiple binning strategies in an ensemble to output continuous predictions and uncertainty without requiring extra calibration data. Empirical results on soil organic carbon and pH from two German fields show competitive or improved probabilistic forecasts, as measured by CRPS, especially when paired with TabPFN or CatBoost. The approach broadens the methodological toolkit for digital soil mapping and supports more reliable, data-informed agricultural decisions in data-limited settings.

Abstract

This paper introduces a model-agnostic approach designed to enhance uncertainty estimation in the predictive modeling of soil properties, a crucial factor for advancing pedometrics and the practice of digital soil mapping. For addressing the typical challenge of data scarcity in soil studies, we present an improved technique for uncertainty estimation. This method is based on the transformation of regression tasks into classification problems, which not only allows for the production of reliable uncertainty estimates but also enables the application of established machine learning algorithms with competitive performance that have not yet been utilized in pedometrics. Empirical results from datasets collected from two German agricultural fields showcase the practical application of the proposed methodology. Our results and findings suggest that the proposed approach has the potential to provide better uncertainty estimation than the models commonly used in pedometrics.

An Efficient Model-Agnostic Approach for Uncertainty Estimation in Data-Restricted Pedometric Applications

TL;DR

This work tackles uncertainty estimation in pedometrics under data scarcity by transforming regression tasks into classification problems via a universal adapter. It enables the use of established classification models, notably TabPFN, and combines multiple binning strategies in an ensemble to output continuous predictions and uncertainty without requiring extra calibration data. Empirical results on soil organic carbon and pH from two German fields show competitive or improved probabilistic forecasts, as measured by CRPS, especially when paired with TabPFN or CatBoost. The approach broadens the methodological toolkit for digital soil mapping and supports more reliable, data-informed agricultural decisions in data-limited settings.

Abstract

This paper introduces a model-agnostic approach designed to enhance uncertainty estimation in the predictive modeling of soil properties, a crucial factor for advancing pedometrics and the practice of digital soil mapping. For addressing the typical challenge of data scarcity in soil studies, we present an improved technique for uncertainty estimation. This method is based on the transformation of regression tasks into classification problems, which not only allows for the production of reliable uncertainty estimates but also enables the application of established machine learning algorithms with competitive performance that have not yet been utilized in pedometrics. Empirical results from datasets collected from two German agricultural fields showcase the practical application of the proposed methodology. Our results and findings suggest that the proposed approach has the potential to provide better uncertainty estimation than the models commonly used in pedometrics.
Paper Structure (18 sections, 7 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 18 sections, 7 equations, 1 figure, 2 tables, 1 algorithm.

Figures (1)

  • Figure 1: Digital soil map showing predicted soil pH values using TabPFN and model uncertainty estimates with the proposed method for the Boelingen dataset, with ordinary kriging applied to interpolate point predictions into rasters.