Table of Contents
Fetching ...

STARS: Sensor-agnostic Transformer Architecture for Remote Sensing

Ethan King, Jaime Rodriguez, Diego Llanes, Timothy Doster, Tegan Emerson, James Koch

TL;DR

A Universal Spectral Representation (USR) is introduced that leverages sensor meta-data, such as sensing kernel specifications and sensing wavelengths, to encode spectra obtained from any spectral instrument into a common representation, such that a single model can ingest data from any sensor.

Abstract

We present a sensor-agnostic spectral transformer as the basis for spectral foundation models. To that end, we introduce a Universal Spectral Representation (USR) that leverages sensor meta-data, such as sensing kernel specifications and sensing wavelengths, to encode spectra obtained from any spectral instrument into a common representation, such that a single model can ingest data from any sensor. Furthermore, we develop a methodology for pre-training such models in a self-supervised manner using a novel random sensor-augmentation and reconstruction pipeline to learn spectral features independent of the sensing paradigm. We demonstrate that our architecture can learn sensor independent spectral features that generalize effectively to sensors not seen during training. This work sets the stage for training foundation models that can both leverage and be effective for the growing diversity of spectral data.

STARS: Sensor-agnostic Transformer Architecture for Remote Sensing

TL;DR

A Universal Spectral Representation (USR) is introduced that leverages sensor meta-data, such as sensing kernel specifications and sensing wavelengths, to encode spectra obtained from any spectral instrument into a common representation, such that a single model can ingest data from any sensor.

Abstract

We present a sensor-agnostic spectral transformer as the basis for spectral foundation models. To that end, we introduce a Universal Spectral Representation (USR) that leverages sensor meta-data, such as sensing kernel specifications and sensing wavelengths, to encode spectra obtained from any spectral instrument into a common representation, such that a single model can ingest data from any sensor. Furthermore, we develop a methodology for pre-training such models in a self-supervised manner using a novel random sensor-augmentation and reconstruction pipeline to learn spectral features independent of the sensing paradigm. We demonstrate that our architecture can learn sensor independent spectral features that generalize effectively to sensors not seen during training. This work sets the stage for training foundation models that can both leverage and be effective for the growing diversity of spectral data.

Paper Structure

This paper contains 11 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: A multi-band image is displayed in (a) where each channel corresponds to a different band of a spectral sensor. In (b), the notional Sensor Response Function (SRF) for each sensor band is depicted. A pixel from the image in (a) contains a measurement value corresponding to each SRF; in this case, each pixel is an RGB triplet. These measurements contain no information about the originating SRFs, which presents a challenge: without special treatment, a model cannot distinguish between shuffled bands, recognize redundant information, or discern between measurements obtained from narrow (hyperspectral) or wide (multi-band) SRFs. Towards this end, we introduce Universal Spectral Representations (USRs, shown in (b)), which encode spectral measurements into a common representation that contains both measurement values as well as SRF and spectral coverage information. USRs act as an approximation for real sensor physics, leveraging positional encodings and known SRFs to produce a feature-rich representation of band-wise measurements that are consistent for any SRF.
  • Figure 2: Three notional Sensor Response Functions (SRF) are shown as dashed rectangles of different widths and heights (as scaled by sensor input). Through recasting these SRFs into our Universal Spectral Representation (USR, Eq. \ref{['eq:USR_band_approx']}), the combined measurement-SRF can be fed into our transformer architecture. Critically, our choice for constructing a USR allows one to approximately recover the embedded information, such as band width and height, as shown here as solid lines.
  • Figure 3: Graphical representation of the augmentation module on single-pixel spectra. The product of SRF bands with the original spectra produces the augmented bands, and integrating those products produces an estimated measurement for each band.
  • Figure 4: Comparison of model spectral reconstruction on a test set pixel viewed through four different sensors not seen during training as inputs.
  • Figure 5: A visualization of the three dimensional learned model embedding for one of the experiments. Plots are the model encoder output from test set pixels input to the model through a) an RGB sensor, and b) a CASI sensor. The points are colored by the class label for the corresponding pixel. For clarity the Houston dataset classes (Healthy Grass, Stressed Grass, Trees) and (Road, Highway, Parking Lot 1, Parking Lot 2) were combined into two classes vegetation and asphalt respectively. Further, the two mixed pixel classes residential and commercial are excluded from the plot.