Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?

Viacheslav Barkov; Jonas Schmidinger; Robin Gebbers; Martin Atzmueller

Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?

Viacheslav Barkov, Jonas Schmidinger, Robin Gebbers, Martin Atzmueller

TL;DR

A comprehensive benchmark that evaluates state-of-the-art ANN architectures, including the latest multilayer perceptron (MLP)-based models (TabM, RealMLP), attention-based transformer variants (FT-Transformer, ExcelFormer), retrieval-augmented approaches (TabR, ModernNCA), and an in-context learning foundation model (TabPFN), reveals that modern ANNs consistently outperform classical methods on the majority of tasks.

Abstract

In the field of pedometrics, tabular machine learning is the predominant method for soil property prediction from remote and proximal soil sensing data, forming a central component of Digital Soil Mapping (DSM). At the field-scale, this predictive soil modeling (PSM) task is typically constrained by small training sample sizes and high feature-to-sample ratios in soil spectroscopy. Traditionally, these conditions have proven challenging for conventional deep learning methods. Classical machine learning algorithms, particularly tree-based models like Random Forest and linear models such as Partial Least Squares Regression, have long been the default choice for pedometric modeling within DSM. Recent advances in artificial neural networks (ANN) for tabular data challenge this view, yet their suitability for field-scale DSM has not been proven. We introduce a comprehensive benchmark that evaluates state-of-the-art ANN architectures, including the latest multilayer perceptron (MLP)-based models (TabM, RealMLP), attention-based transformer variants (FT-Transformer, ExcelFormer, T2G-Former, AMFormer), retrieval-augmented approaches (TabR, ModernNCA), and an in-context learning foundation model (TabPFN). Our evaluation encompasses 31 field- and farm-scale datasets containing 30-460 soil samples and three critical soil properties: soil organic matter or soil organic carbon, pH, and clay content. Our results reveal that modern ANNs consistently outperform classical methods on the majority of tasks, demonstrating that deep learning has matured sufficiently to overcome the long-standing dominance of classical machine learning in pedometrics. Notably, TabPFN delivers the strongest overall performance, showing robustness across varying conditions. We therefore recommend the adoption of modern ANNs for field-scale DSM and propose TabPFN as the new default choice in the toolkit of every pedometrician.

Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?

TL;DR

Abstract

Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)