Table of Contents
Fetching ...

A text-based, generative deep learning model for soil reflectance spectrum simulation in the VIS-NIR (400-2499 nm) bands

Tong Lei, Brian N. Bailey

TL;DR

The study addresses the challenge of simulating soil VIS-NIR reflectance spectra across diverse soils by introducing SOGM, a diffusion-based framework that generates spectra from text-based soil property descriptions. It combines a spectral padding module, a transformer-based property embedding, and a denoising diffusion process to produce spectra even with incomplete inputs, and extends to wet spectra predictions and 3D radiative-image generation via Helios. The approach leverages a large, heterogeneous dataset (≈180k spectra) and demonstrates robust performance on unseen datasets, with RMSEs around 5% for full-input conditions and strong correlation ($r^2$ up to ~0.9) for several bands; padding and wet-spectrum modules further enhance practical utility. By integrating with 3D ray-tracing and PROSAIL-compatible workflows, SOGM enables realistic soil-plant imagery and radiative-transfer studies, offering a flexible, open-source tool for remote sensing and ecosystem research.

Abstract

Simulating soil reflectance spectra is invaluable for soil-plant radiative modeling and training machine learning models, yet it is difficult as the intricate relationships between soil structure and its constituents. To address this, a fully data-driven soil optics generative model (SOGM) for simulation of soil reflectance spectra based on soil property inputs was developed. The model is trained on an extensive dataset comprising nearly 180,000 soil spectra-property pairs from 17 datasets. It generates soil reflectance spectra from text-based inputs describing soil properties and their values rather than only numerical values and labels in binary vector format. The generative model can simulate output spectra based on an incomplete set of input properties. SOGM is based on the denoising diffusion probabilistic model (DDPM). Two additional sub-models were also built to complement the SOGM: a spectral padding model that can fill in the gaps for spectra shorter than the full visible-near-infrared range (VIS-NIR; 400 to 2499 nm), and a wet soil spectra model that can estimate the effects of water content on soil reflectance spectra given the dry spectrum predicted by the SOGM. The SOGM was up-scaled by coupling with the Helios 3D plant modeling software, which allowed for generation of synthetic aerial images of simulated soil and plant scenes. It can also be easily integrated with soil-plant radiation model used for remote sensin research like PROSAIL. The testing results of the SOGM on new datasets that not included in model training proved that the model can generate reasonable soil reflectance spectra based on available property inputs. The presented models are openly accessible on: https://github.com/GEMINI-Breeding/SOGM_soil_spectra_simulation.

A text-based, generative deep learning model for soil reflectance spectrum simulation in the VIS-NIR (400-2499 nm) bands

TL;DR

The study addresses the challenge of simulating soil VIS-NIR reflectance spectra across diverse soils by introducing SOGM, a diffusion-based framework that generates spectra from text-based soil property descriptions. It combines a spectral padding module, a transformer-based property embedding, and a denoising diffusion process to produce spectra even with incomplete inputs, and extends to wet spectra predictions and 3D radiative-image generation via Helios. The approach leverages a large, heterogeneous dataset (≈180k spectra) and demonstrates robust performance on unseen datasets, with RMSEs around 5% for full-input conditions and strong correlation ( up to ~0.9) for several bands; padding and wet-spectrum modules further enhance practical utility. By integrating with 3D ray-tracing and PROSAIL-compatible workflows, SOGM enables realistic soil-plant imagery and radiative-transfer studies, offering a flexible, open-source tool for remote sensing and ecosystem research.

Abstract

Simulating soil reflectance spectra is invaluable for soil-plant radiative modeling and training machine learning models, yet it is difficult as the intricate relationships between soil structure and its constituents. To address this, a fully data-driven soil optics generative model (SOGM) for simulation of soil reflectance spectra based on soil property inputs was developed. The model is trained on an extensive dataset comprising nearly 180,000 soil spectra-property pairs from 17 datasets. It generates soil reflectance spectra from text-based inputs describing soil properties and their values rather than only numerical values and labels in binary vector format. The generative model can simulate output spectra based on an incomplete set of input properties. SOGM is based on the denoising diffusion probabilistic model (DDPM). Two additional sub-models were also built to complement the SOGM: a spectral padding model that can fill in the gaps for spectra shorter than the full visible-near-infrared range (VIS-NIR; 400 to 2499 nm), and a wet soil spectra model that can estimate the effects of water content on soil reflectance spectra given the dry spectrum predicted by the SOGM. The SOGM was up-scaled by coupling with the Helios 3D plant modeling software, which allowed for generation of synthetic aerial images of simulated soil and plant scenes. It can also be easily integrated with soil-plant radiation model used for remote sensin research like PROSAIL. The testing results of the SOGM on new datasets that not included in model training proved that the model can generate reasonable soil reflectance spectra based on available property inputs. The presented models are openly accessible on: https://github.com/GEMINI-Breeding/SOGM_soil_spectra_simulation.
Paper Structure (18 sections, 6 equations, 14 figures, 6 tables)

This paper contains 18 sections, 6 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: Schematic representation of the training and architecture of the spectra padding model. A portion of the input spectra is randomly set to 0 (indicated by the grey shadow), and the down-sampled, full-range input spectra are reconstructed. The embedding block consists of transformer layers, while the encoder and decoder blocks are composed of 1D CNN layers.
  • Figure 2: Schematic representation of the denoising diffusion model. The inputs consist of the sum of random noise and soil spectra, along with corresponding property s. The output is the input random noise during model training. The blue blocks represent 1D CNN layers, and the orange blocks represent transformer layers.
  • Figure 3: Example illustration of the spectral denoising process. During the reverse diffusion process, the soil spectra are progressively recovered as the time step decreases. Four example spectra are shown at time steps ($t$): 200, 50, 10, and 0.
  • Figure 4: Schematic representation of the 3D radiation model for image generation. A ray-tracing-based camera model is used to simulate radiation that is emitted from a radiation source (e.g., sun, LED light) and reaches the camera after being scattered by objects in the scene. The PROSPECT-based leaf optical model and SOGM can generate the leaf and soil optical properties, respectively. Finally, the simulated camera generates resulting images that can be arbitrarily auto-annotated.
  • Figure 5: Full-range spectra obtained by applying the spectra padding model to (a) Barthès2023 and (b) MARMIT2020 datasets. The solid curves represent the original spectra, and the dotted curves are the reconstructed portion of the spectra.
  • ...and 9 more figures