Table of Contents
Fetching ...

SpecTM: Spectral Targeted Masking for Trustworthy Foundation Models

Syed Usama Imtiaz, Mitra Nasr Azadani, Nasrin Alamdari

Abstract

Foundation models are now increasingly being developed for Earth observation (EO), yet they often rely on stochastic masking that do not explicitly enforce physics constraints; a critical trustworthiness limitation, in particular for predictive models that guide public health decisions. In this work, we propose SpecTM (Spectral Targeted Masking), a physics-informed masking design that encourages the reconstruction of targeted bands from cross-spectral context during pretraining. To achieve this, we developed an adaptable multi-task (band reconstruction, bio-optical index inference, and 8-day-ahead temporal prediction) self-supervised learning (SSL) framework that encodes spectrally intrinsic representations via joint optimization, and evaluated it on a downstream microcystin concentration regression model using NASA PACE hyperspectral imagery over Lake Erie. SpecTM achieves R^2 = 0.695 (current week) and R^2 = 0.620 (8-day-ahead) predictions surpassing all baseline models by (+34% (0.51 Ridge) and +99% (SVR 0.31)) respectively. Our ablation experiments show targeted masking improves predictions by +0.037 R^2 over random masking. Furthermore, it outperforms strong baselines with 2.2x superior label efficiency under extreme scarcity. SpecTM enables physics-informed representation learning across EO domains and improves the interpretability of foundation models.

SpecTM: Spectral Targeted Masking for Trustworthy Foundation Models

Abstract

Foundation models are now increasingly being developed for Earth observation (EO), yet they often rely on stochastic masking that do not explicitly enforce physics constraints; a critical trustworthiness limitation, in particular for predictive models that guide public health decisions. In this work, we propose SpecTM (Spectral Targeted Masking), a physics-informed masking design that encourages the reconstruction of targeted bands from cross-spectral context during pretraining. To achieve this, we developed an adaptable multi-task (band reconstruction, bio-optical index inference, and 8-day-ahead temporal prediction) self-supervised learning (SSL) framework that encodes spectrally intrinsic representations via joint optimization, and evaluated it on a downstream microcystin concentration regression model using NASA PACE hyperspectral imagery over Lake Erie. SpecTM achieves R^2 = 0.695 (current week) and R^2 = 0.620 (8-day-ahead) predictions surpassing all baseline models by (+34% (0.51 Ridge) and +99% (SVR 0.31)) respectively. Our ablation experiments show targeted masking improves predictions by +0.037 R^2 over random masking. Furthermore, it outperforms strong baselines with 2.2x superior label efficiency under extreme scarcity. SpecTM enables physics-informed representation learning across EO domains and improves the interpretability of foundation models.
Paper Structure (14 sections, 2 equations, 4 figures)

This paper contains 14 sections, 2 equations, 4 figures.

Figures (4)

  • Figure 1: End-to-End Experimental Design for SpecTM Pretraining and Microcystin Quantification using NASA PACE Hyperspectral Imagery and limited ground truth labels
  • Figure 2: Experimental results. (a) SpecTM outperforms all baselines for both current-week (+34%) and 8-day-ahead (+99%) microcystin prediction. (b) Ablation analysis: targeted masking improves over random by +0.037 $R^2$; SSL pretraining provides +0.18 $R^2$ gain over random initialization.
  • Figure 3: SSL pretraining validation. (a) Single-sample spectral reconstruction showing visible context bands (blue), true masked values (red), and predicted values (green) across diagnostic wavelengths (615--720 nm). (b) Aggregate reconstruction accuracy across 4,096 samples ($r{=}1.000$, RMSE${=}0.018$). Shaded region: 95% CI.
  • Figure 4: Label efficiency under data scarcity. SpecTM achieves $1.8\times$ (current) and $2.2\times$ (8-day-ahead) improvement over baseline at 5% labeled data. Shaded regions: $\pm$1 SD across five stratified subsamples.