Table of Contents
Fetching ...

TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

Zhengpeng Feng, Clement Atzberger, Sadiq Jaffer, Jovana Knezevic, Silja Sormunen, Robin Young, Madeline C. Lisaius, Markus Immitzer, Toby Jackson, James Ball, David A. Coomes, Anil Madhavapeddy, Andrew Blake, Srinivasan Keshav

TL;DR

TESSERA addresses irregular Earth Observation time series by learning pixel-wise, multi-modal embeddings with temporal-sampling invariance using a dual-SAR/optical encoder and a large BT-based projector. It introduces the d-pixel temporal representation, global shuffling, and mix-up regularization, producing 128-D embeddings that are quantized to $8$-bit and released globally as $10$ m, annual maps with an Open GeoTessera library. Across six downstream benchmarks for classification, segmentation, and regression, TESSERA achieves state-of-the-art accuracy with high label efficiency, often needing only lightweight heads and minimal computation. This Embeddings-as-Data approach democratizes access to high-performance EO features, enabling large-scale retrieval and inference with practical tools while maintaining strong performance under cloudiness and data sparsity.

Abstract

Satellite Earth-observation (EO) time series in the optical and microwave ranges of the electromagnetic spectrum are often irregular due to orbital patterns and cloud obstruction. Compositing addresses these issues but loses information with respect to vegetation phenology, which is critical for many downstream tasks. Instead, we present TESSERA, a pixel-wise foundation model for multi-modal (Sentinel-1/2) EO time series that learns robust, label-efficient embeddings. During model training, TESSERA uses Barlow Twins and sparse random temporal sampling to enforce invariance to the selection of valid observations. We employ two key regularizers: global shuffling to decorrelate spatial neighborhoods and mix-based regulation to improve invariance under extreme sparsity. We find that for diverse classification, segmentation, and regression tasks, TESSERA embeddings deliver state-of-the-art accuracy with high label efficiency, often requiring only a small task head and minimal computation. To democratize access, adhere to FAIR principles, and simplify use, we release global, annual, 10m, pixel-wise int8 embeddings together with open weights/code and lightweight adaptation heads, thus providing practical tooling for large-scale retrieval and inference at planetary scale. The model training/inference code, downstream task code, and pre-generated embeddings can be accessed at https://github.com/ucam-eo

TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

TL;DR

TESSERA addresses irregular Earth Observation time series by learning pixel-wise, multi-modal embeddings with temporal-sampling invariance using a dual-SAR/optical encoder and a large BT-based projector. It introduces the d-pixel temporal representation, global shuffling, and mix-up regularization, producing 128-D embeddings that are quantized to -bit and released globally as m, annual maps with an Open GeoTessera library. Across six downstream benchmarks for classification, segmentation, and regression, TESSERA achieves state-of-the-art accuracy with high label efficiency, often needing only lightweight heads and minimal computation. This Embeddings-as-Data approach democratizes access to high-performance EO features, enabling large-scale retrieval and inference with practical tools while maintaining strong performance under cloudiness and data sparsity.

Abstract

Satellite Earth-observation (EO) time series in the optical and microwave ranges of the electromagnetic spectrum are often irregular due to orbital patterns and cloud obstruction. Compositing addresses these issues but loses information with respect to vegetation phenology, which is critical for many downstream tasks. Instead, we present TESSERA, a pixel-wise foundation model for multi-modal (Sentinel-1/2) EO time series that learns robust, label-efficient embeddings. During model training, TESSERA uses Barlow Twins and sparse random temporal sampling to enforce invariance to the selection of valid observations. We employ two key regularizers: global shuffling to decorrelate spatial neighborhoods and mix-based regulation to improve invariance under extreme sparsity. We find that for diverse classification, segmentation, and regression tasks, TESSERA embeddings deliver state-of-the-art accuracy with high label efficiency, often requiring only a small task head and minimal computation. To democratize access, adhere to FAIR principles, and simplify use, we release global, annual, 10m, pixel-wise int8 embeddings together with open weights/code and lightweight adaptation heads, thus providing practical tooling for large-scale retrieval and inference at planetary scale. The model training/inference code, downstream task code, and pre-generated embeddings can be accessed at https://github.com/ucam-eo

Paper Structure

This paper contains 70 sections, 14 equations, 17 figures, 8 tables, 2 algorithms.

Figures (17)

  • Figure 1: TESSERA advances the embedding-as-data approach for Earth Observation. It delivers (a) analysis-ready global-scale products, (b) pixel-wise, high-fidelity representations, and has (c) state-of-the-art downstream accuracy.
  • Figure 2: Downstream adaptation paradigms.(a–b) Most RSFMs require fine-tuning the encoder or training task-specific decoders and often do not expose intermediate embeddings. (b-c)TESSERA produces fixed, task-agnostic embeddings that plug into lightweight heads, avoiding backbone fine-tuning.
  • Figure 3: Overview of the TESSERA processing pipeline.(a) Overall architecture: multi-temporal Sentinel-1/2 observations are converted into modality-specific d-pixels, augmented twice, and encoded by dual branches with shared weights to produce compact 128-D embeddings. (b) Zoom-in on the Sentinel-1 branch: temporal sequences are processed by a 4-block Transformer followed by GRU pooling to capture dynamic backscatter patterns before fusion into the joint TESSERA embedding.
  • Figure 4: Label-efficient crop classification on Austrian Crop.(a) Weighted F1 vs. training label ratio using a small head on frozen embeddings. (b) Few-shot performance. TESSERA attains strong accuracy with very few labels; error bars denote variation over runs.
  • Figure 5: Biomassters AGB regression.(a) RMSE/$R^2$/MB (mean bias) vs. label fraction. (b) Predicted AGB vs. Ground truth AGB. (c) Spatial AGB maps from TESSERA trained with 4% labels. (d) Predicted vs. ground-truth scatter. TESSERA closely tracks the task-specific winner with significantly fewer labels.
  • ...and 12 more figures