Machine Learning Models for Soil Parameter Prediction Based on Satellite, Weather, Clay and Yield Data
Calvin Kammerlander, Viola Kolb, Marinus Luegmair, Lou Scheermann, Maximilian Schmailzl, Marco Seufert, Jiayun Zhang, Denis Dalic, Torsten Schön
TL;DR
This work tackles the challenge of predicting soil nutrient levels without laboratory tests by combining satellite imagery, weather, and ancillary data within a regression framework. A two-phase approach builds a European baseline model using Sentinel-2 and LUCAS TOPSOIL, then enriches predictions with weather, yield proxies, and Clay embeddings across three ML algorithms (XGBoost, FCNN, Random Forest), complemented by spatial cross-validation. Key contributions include a detailed data-pipeline, model comparisons, and an extended feature analysis showing the value and limits of high-dimensional Clay embeddings, with soil-property predictions achieving competitive RMSEs and insights into feature importance. The study lays a scalable, reproducible foundation for precision fertilization in Africa and other under-resourced regions, while highlighting data gaps—notably timestamped African soil observations—that must be addressed to realize full generalization and impact.
Abstract
Efficient nutrient management and precise fertilization are essential for advancing modern agriculture, particularly in regions striving to optimize crop yields sustainably. The AgroLens project endeavors to address this challenge by develop ing Machine Learning (ML)-based methodologies to predict soil nutrient levels without reliance on laboratory tests. By leveraging state of the art techniques, the project lays a foundation for acionable insights to improve agricultural productivity in resource-constrained areas, such as Africa. The approach begins with the development of a robust European model using the LUCAS Soil dataset and Sentinel-2 satellite imagery to estimate key soil properties, including phosphorus, potassium, nitrogen, and pH levels. This model is then enhanced by integrating supplementary features, such as weather data, harvest rates, and Clay AI-generated embeddings. This report details the methodological framework, data preprocessing strategies, and ML pipelines employed in this project. Advanced algorithms, including Random Forests, Extreme Gradient Boosting (XGBoost), and Fully Connected Neural Networks (FCNN), were implemented and finetuned for precise nutrient prediction. Results showcase robust model performance, with root mean square error values meeting stringent accuracy thresholds. By establishing a reproducible and scalable pipeline for soil nutrient prediction, this research paves the way for transformative agricultural applications, including precision fertilization and improved resource allocation in underresourced regions like Africa.
