Table of Contents
Fetching ...

SSL-SoilNet: A Hybrid Transformer-based Framework with Self-Supervised Learning for Large-scale Soil Organic Carbon Prediction

Nafiseh Kakhani, Moien Rangzan, Ali Jamali, Sara Attarchi, Seyed Kazem Alavipanah, Michael Mommert, Nikolaos Tziolas, Thomas Scholten

TL;DR

SSL-SoilNet introduces a hybrid multimodal framework that jointly processes image-based remote sensing data and climate time series through self-supervised contrastive learning, followed by supervised fine-tuning to predict soil organic carbon (SOC). The method leverages Vision Transformer backbones for imagery and Transformer backbones for climate data, enabling cross-modal representation alignment at each location. Compared with traditional supervised models and several ML baselines, SSL-SoilNet achieves higher accuracy (lower RMSE, higher R^2) and better predictive reliability on European LUCAS and US RaCA SOC datasets, with pronounced gains for skewed SOC distributions. The approach demonstrates strong potential for scalable, regionally adaptive digital soil mapping and can be extended to other soil properties and higher-resolution data for land management applications.

Abstract

Soil Organic Carbon (SOC) constitutes a fundamental component of terrestrial ecosystem functionality, playing a pivotal role in nutrient cycling, hydrological balance, and erosion mitigation. Precise mapping of SOC distribution is imperative for the quantification of ecosystem services, notably carbon sequestration and soil fertility enhancement. Digital soil mapping (DSM) leverages statistical models and advanced technologies, including machine learning (ML), to accurately map soil properties, such as SOC, utilizing diverse data sources like satellite imagery, topography, remote sensing indices, and climate series. Within the domain of ML, self-supervised learning (SSL), which exploits unlabeled data, has gained prominence in recent years. This study introduces a novel approach that aims to learn the geographical link between multimodal features via self-supervised contrastive learning, employing pretrained Vision Transformers (ViT) for image inputs and Transformers for climate data, before fine-tuning the model with ground reference samples. The proposed approach has undergone rigorous testing on two distinct large-scale datasets, with results indicating its superiority over traditional supervised learning models, which depends solely on labeled data. Furthermore, through the utilization of various evaluation metrics (e.g., RMSE, MAE, CCC, etc.), the proposed model exhibits higher accuracy when compared to other conventional ML algorithms like random forest and gradient boosting. This model is a robust tool for predicting SOC and contributes to the advancement of DSM techniques, thereby facilitating land management and decision-making processes based on accurate information.

SSL-SoilNet: A Hybrid Transformer-based Framework with Self-Supervised Learning for Large-scale Soil Organic Carbon Prediction

TL;DR

SSL-SoilNet introduces a hybrid multimodal framework that jointly processes image-based remote sensing data and climate time series through self-supervised contrastive learning, followed by supervised fine-tuning to predict soil organic carbon (SOC). The method leverages Vision Transformer backbones for imagery and Transformer backbones for climate data, enabling cross-modal representation alignment at each location. Compared with traditional supervised models and several ML baselines, SSL-SoilNet achieves higher accuracy (lower RMSE, higher R^2) and better predictive reliability on European LUCAS and US RaCA SOC datasets, with pronounced gains for skewed SOC distributions. The approach demonstrates strong potential for scalable, regionally adaptive digital soil mapping and can be extended to other soil properties and higher-resolution data for land management applications.

Abstract

Soil Organic Carbon (SOC) constitutes a fundamental component of terrestrial ecosystem functionality, playing a pivotal role in nutrient cycling, hydrological balance, and erosion mitigation. Precise mapping of SOC distribution is imperative for the quantification of ecosystem services, notably carbon sequestration and soil fertility enhancement. Digital soil mapping (DSM) leverages statistical models and advanced technologies, including machine learning (ML), to accurately map soil properties, such as SOC, utilizing diverse data sources like satellite imagery, topography, remote sensing indices, and climate series. Within the domain of ML, self-supervised learning (SSL), which exploits unlabeled data, has gained prominence in recent years. This study introduces a novel approach that aims to learn the geographical link between multimodal features via self-supervised contrastive learning, employing pretrained Vision Transformers (ViT) for image inputs and Transformers for climate data, before fine-tuning the model with ground reference samples. The proposed approach has undergone rigorous testing on two distinct large-scale datasets, with results indicating its superiority over traditional supervised learning models, which depends solely on labeled data. Furthermore, through the utilization of various evaluation metrics (e.g., RMSE, MAE, CCC, etc.), the proposed model exhibits higher accuracy when compared to other conventional ML algorithms like random forest and gradient boosting. This model is a robust tool for predicting SOC and contributes to the advancement of DSM techniques, thereby facilitating land management and decision-making processes based on accurate information.
Paper Structure (26 sections, 8 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 26 sections, 8 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Conceptual framework of this study. A large number of unlabeled data is leveraged for self-supervised training, followed by fine-tuning on a limited number of available soil samples serving as ground truth.
  • Figure 2: Simplified architecture illustrating the proposed approach. The training process involves two steps. In the initial phase, contrastive learning is utilized, wherein both a ViT and a transformer are trained. They process pairs of image-based and time-series input data obtained through random sampling. The ViT and transformer extract intermediate representations, $\mathcal{I}$ and $\mathcal{T}$, used for self-supervised training. In the second step, fine-tuning is performed using the same architecture with ground truth samples. A regression head is added to predict the target output, SOC.
  • Figure 3: Histogram and kernel density estimation plot depict the distribution of SOC (g/kg) values for LUCAS dataset.
  • Figure 4: Histogram and kernel density estimation plot depict the distribution of Log-transformed SOC (Mg/ha) values for RaCA dataset.
  • Figure 5: The process of raster-based data preparation.
  • ...and 3 more figures