USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery
Jeremy Irvin, Lucas Tao, Joanne Zhou, Yuntao Ma, Langston Nashold, Benjamin Liu, Andrew Y. Ng
TL;DR
USat introduces a unified self-supervised encoder designed for multi-sensor remote sensing data with heterogeneous spectral bands and varying ground sampling distances. The USat encoder uses per-band patch projections, spectral-group pooling, and a combination of superpositional, spectral-group, and sensor encodings to maintain geospatial alignment across sensors, enabling robust MAE-style pretraining (USatMAE). Empirical results on USatlas show that multi-sensor pretraining generally outperforms single-sensor baselines across EuroSAT, BigEarthNet, and METER-ML, with pronounced benefits in low-data regimes. The work demonstrates competitive performance against ImageNet pretraining and highlights practical implications for multi-sensor remote sensing, including improved transferability and flexibility in spectral-band usage.
Abstract
Large, self-supervised vision models have led to substantial advancements for automatically interpreting natural images. Recent works have begun tailoring these methods to remote sensing data which has rich structure with multi-sensor, multi-spectral, and temporal information providing massive amounts of self-labeled data that can be used for self-supervised pre-training. In this work, we develop a new encoder architecture called USat that can input multi-spectral data from multiple sensors for self-supervised pre-training. USat is a vision transformer with modified patch projection layers and positional encodings to model spectral bands with varying spatial scales from multiple sensors. We integrate USat into a Masked Autoencoder (MAE) self-supervised pre-training procedure and find that a pre-trained USat outperforms state-of-the-art self-supervised MAE models trained on remote sensing data on multiple remote sensing benchmark datasets (up to 8%) and leads to improvements in low data regimes (up to 7%). Code and pre-trained weights are available at https://github.com/stanfordmlgroup/USat .
