Table of Contents
Fetching ...

Machine Learning Models for Soil Parameter Prediction Based on Satellite, Weather, Clay and Yield Data

Calvin Kammerlander, Viola Kolb, Marinus Luegmair, Lou Scheermann, Maximilian Schmailzl, Marco Seufert, Jiayun Zhang, Denis Dalic, Torsten Schön

TL;DR

This work tackles the challenge of predicting soil nutrient levels without laboratory tests by combining satellite imagery, weather, and ancillary data within a regression framework. A two-phase approach builds a European baseline model using Sentinel-2 and LUCAS TOPSOIL, then enriches predictions with weather, yield proxies, and Clay embeddings across three ML algorithms (XGBoost, FCNN, Random Forest), complemented by spatial cross-validation. Key contributions include a detailed data-pipeline, model comparisons, and an extended feature analysis showing the value and limits of high-dimensional Clay embeddings, with soil-property predictions achieving competitive RMSEs and insights into feature importance. The study lays a scalable, reproducible foundation for precision fertilization in Africa and other under-resourced regions, while highlighting data gaps—notably timestamped African soil observations—that must be addressed to realize full generalization and impact.

Abstract

Efficient nutrient management and precise fertilization are essential for advancing modern agriculture, particularly in regions striving to optimize crop yields sustainably. The AgroLens project endeavors to address this challenge by develop ing Machine Learning (ML)-based methodologies to predict soil nutrient levels without reliance on laboratory tests. By leveraging state of the art techniques, the project lays a foundation for acionable insights to improve agricultural productivity in resource-constrained areas, such as Africa. The approach begins with the development of a robust European model using the LUCAS Soil dataset and Sentinel-2 satellite imagery to estimate key soil properties, including phosphorus, potassium, nitrogen, and pH levels. This model is then enhanced by integrating supplementary features, such as weather data, harvest rates, and Clay AI-generated embeddings. This report details the methodological framework, data preprocessing strategies, and ML pipelines employed in this project. Advanced algorithms, including Random Forests, Extreme Gradient Boosting (XGBoost), and Fully Connected Neural Networks (FCNN), were implemented and finetuned for precise nutrient prediction. Results showcase robust model performance, with root mean square error values meeting stringent accuracy thresholds. By establishing a reproducible and scalable pipeline for soil nutrient prediction, this research paves the way for transformative agricultural applications, including precision fertilization and improved resource allocation in underresourced regions like Africa.

Machine Learning Models for Soil Parameter Prediction Based on Satellite, Weather, Clay and Yield Data

TL;DR

This work tackles the challenge of predicting soil nutrient levels without laboratory tests by combining satellite imagery, weather, and ancillary data within a regression framework. A two-phase approach builds a European baseline model using Sentinel-2 and LUCAS TOPSOIL, then enriches predictions with weather, yield proxies, and Clay embeddings across three ML algorithms (XGBoost, FCNN, Random Forest), complemented by spatial cross-validation. Key contributions include a detailed data-pipeline, model comparisons, and an extended feature analysis showing the value and limits of high-dimensional Clay embeddings, with soil-property predictions achieving competitive RMSEs and insights into feature importance. The study lays a scalable, reproducible foundation for precision fertilization in Africa and other under-resourced regions, while highlighting data gaps—notably timestamped African soil observations—that must be addressed to realize full generalization and impact.

Abstract

Efficient nutrient management and precise fertilization are essential for advancing modern agriculture, particularly in regions striving to optimize crop yields sustainably. The AgroLens project endeavors to address this challenge by develop ing Machine Learning (ML)-based methodologies to predict soil nutrient levels without reliance on laboratory tests. By leveraging state of the art techniques, the project lays a foundation for acionable insights to improve agricultural productivity in resource-constrained areas, such as Africa. The approach begins with the development of a robust European model using the LUCAS Soil dataset and Sentinel-2 satellite imagery to estimate key soil properties, including phosphorus, potassium, nitrogen, and pH levels. This model is then enhanced by integrating supplementary features, such as weather data, harvest rates, and Clay AI-generated embeddings. This report details the methodological framework, data preprocessing strategies, and ML pipelines employed in this project. Advanced algorithms, including Random Forests, Extreme Gradient Boosting (XGBoost), and Fully Connected Neural Networks (FCNN), were implemented and finetuned for precise nutrient prediction. Results showcase robust model performance, with root mean square error values meeting stringent accuracy thresholds. By establishing a reproducible and scalable pipeline for soil nutrient prediction, this research paves the way for transformative agricultural applications, including precision fertilization and improved resource allocation in underresourced regions like Africa.

Paper Structure

This paper contains 87 sections, 1 equation, 21 figures, 17 tables.

Figures (21)

  • Figure 1: Schematic Concept of the Training Process for Model and Data Usage for the Europe Model
  • Figure 2: Correlation coefficient matrix for the input data from Sentinel of the Europe model
  • Figure 3: Locations of Soil Sample Collection of the LUCAS 2018 TOPSOIL Dataset
  • Figure 4: Histogram for the target data nutrients pH, nitrogen, phosphorus and potassium in the LUCAS 2018 TOPSOIL dataset
  • Figure 5: Visualization of Training, Validation and Test Grids (Grid size: 4°× 4°)
  • ...and 16 more figures