Soil nitrogen forecasting from environmental variables provided by multisensor remote sensing images
Weiying Zhao, Ganzorig Chuluunbat, Aleksei Unagaev, Natalia Efremova
TL;DR
This work addresses forecasting soil nitrogen content by fusing multi-sensor remote sensing environmental variables with the LUCAS ground-truth dataset. The authors evaluate four tree-based algorithms (CatBoost, LightGBM, XGBoost, and ExtraTrees) using SHAP-driven feature selection and Bayesian hyperparameter optimization, built on a dataset of 21,244 samples across croplands and grasslands. CatBoost consistently achieves the best accuracy, with $MAPE$ and $MAE$ improvements and $R^2$ around $0.49$ on cropland, illustrating the value of feature selection and hyperparameter tuning in complex, multi-sensor data. The study demonstrates a scalable, generalizable framework for precision agriculture and environmental monitoring that leverages diverse satellite-derived features.
Abstract
This study introduces a framework for forecasting soil nitrogen content, leveraging multi-modal data, including multi-sensor remote sensing images and advanced machine learning methods. We integrate the Land Use/Land Cover Area Frame Survey (LUCAS) database, which covers European and UK territory, with environmental variables from satellite sensors to create a dataset of novel features. We further test a broad range of machine learning algorithms, focusing on tree-based models such as CatBoost, LightGBM, and XGBoost. We test the proposed methods with a variety of land cover classes, including croplands and grasslands to ensure the robustness of this approach. Our results demonstrate that the CatBoost model surpasses other methods in accuracy. This research advances the field of agricultural management and environmental monitoring and demonstrates the significant potential of integrating multisensor remote sensing data with machine learning for environmental analysis.
