SSL-SoilNet: A Hybrid Transformer-based Framework with Self-Supervised Learning for Large-scale Soil Organic Carbon Prediction
Nafiseh Kakhani, Moien Rangzan, Ali Jamali, Sara Attarchi, Seyed Kazem Alavipanah, Michael Mommert, Nikolaos Tziolas, Thomas Scholten
TL;DR
SSL-SoilNet introduces a hybrid multimodal framework that jointly processes image-based remote sensing data and climate time series through self-supervised contrastive learning, followed by supervised fine-tuning to predict soil organic carbon (SOC). The method leverages Vision Transformer backbones for imagery and Transformer backbones for climate data, enabling cross-modal representation alignment at each location. Compared with traditional supervised models and several ML baselines, SSL-SoilNet achieves higher accuracy (lower RMSE, higher R^2) and better predictive reliability on European LUCAS and US RaCA SOC datasets, with pronounced gains for skewed SOC distributions. The approach demonstrates strong potential for scalable, regionally adaptive digital soil mapping and can be extended to other soil properties and higher-resolution data for land management applications.
Abstract
Soil Organic Carbon (SOC) constitutes a fundamental component of terrestrial ecosystem functionality, playing a pivotal role in nutrient cycling, hydrological balance, and erosion mitigation. Precise mapping of SOC distribution is imperative for the quantification of ecosystem services, notably carbon sequestration and soil fertility enhancement. Digital soil mapping (DSM) leverages statistical models and advanced technologies, including machine learning (ML), to accurately map soil properties, such as SOC, utilizing diverse data sources like satellite imagery, topography, remote sensing indices, and climate series. Within the domain of ML, self-supervised learning (SSL), which exploits unlabeled data, has gained prominence in recent years. This study introduces a novel approach that aims to learn the geographical link between multimodal features via self-supervised contrastive learning, employing pretrained Vision Transformers (ViT) for image inputs and Transformers for climate data, before fine-tuning the model with ground reference samples. The proposed approach has undergone rigorous testing on two distinct large-scale datasets, with results indicating its superiority over traditional supervised learning models, which depends solely on labeled data. Furthermore, through the utilization of various evaluation metrics (e.g., RMSE, MAE, CCC, etc.), the proposed model exhibits higher accuracy when compared to other conventional ML algorithms like random forest and gradient boosting. This model is a robust tool for predicting SOC and contributes to the advancement of DSM techniques, thereby facilitating land management and decision-making processes based on accurate information.
