Physics informed Transformer-VAE for biophysical parameter estimation: PROSAIL model inversion in Sentinel-2 imagery
Prince Mensah, Pelumi Victor Aderinto, Ibrahim Salihu Yusuf, Arnu Pretorius
TL;DR
This work addresses the challenge of retrieving canopy biophysical variables from satellite data without needing real‑image labels for training. It introduces a physics‑informed Transformer‑VAE that embeds the PROSAIL radiative transfer model as a differentiable decoder, trained exclusively on synthetic PROSAIL simulations to infer posterior distributions over canopy parameters such as $LAI$ and $CCC$. The model demonstrates competitive accuracy on real field datasets (FRM4Veg and BelSAR) compared with methods trained on real imagery, while providing uncertainty quantification via the latent distribution. By coupling physical model constraints with a Transformer encoder, the approach achieves physically plausible inversions and suggests a viable path to global, calibration‑free vegetation trait products from Sentinel‑2 imagery, with potential extensions to hyperspectral data and additional RTMs.
Abstract
Accurate retrieval of vegetation biophysical variables from satellite imagery is crucial for ecosystem monitoring and agricultural management. In this work, we propose a physics-informed Transformer-VAE architecture to invert the PROSAIL radiative transfer model for simultaneous estimation of key canopy parameters from Sentinel-2 data. Unlike previous hybrid approaches that require real satellite images for self-supevised training. Our model is trained exclusively on simulated data, yet achieves performance on par with state-of-the-art methods that utilize real imagery. The Transformer-VAE incorporates the PROSAIL model as a differentiable physical decoder, ensuring that inferred latent variables correspond to physically plausible leaf and canopy properties. We demonstrate retrieval of leaf area index (LAI) and canopy chlorophyll content (CCC) on real-world field datasets (FRM4Veg and BelSAR) with accuracy comparable to models trained with real Sentinel-2 data. Our method requires no in-situ labels or calibration on real images, offering a cost-effective and self-supervised solution for global vegetation monitoring. The proposed approach illustrates how integrating physical models with advanced deep networks can improve the inversion of RTMs, opening new prospects for large-scale, physically-constrained remote sensing of vegetation traits.
